Liver Lesion Segmentation Using Deep Learning Models
An estimated 9.6 million deaths, or one in every six deaths, were attributed to cancer in 2018, making it the second highest cause of death worldwide. Men are more likely to develop lung, prostate, colorectal, stomach, and liver cancer than women, who are more likely to develop breast, colorectal, lung, cervical, and thyroid cancer. The primary goals of medical image segmentation include studying anatomical structure, identifying regions of interest (RoI), and measuring tissue volume to track tumor growth. It is crucial to diagnose and treat liver lesions quickly in order to stop the tumor from spreading further. Deep learning model-based liver segmentation has become very popular in the field of medical image analysis. This study explores various deep learning-based liver lesion segmentation algorithms and methodologies. Based on the developed models, the performance, and their limitations of these methodologies are contrasted. In the end, it was concluded that small size lesion segmentation, in particular, is still an open research subject for computer-aided systems of liver lesion segmentation, for there are still a number of technical issues that need to be resolved.
One of the most prevalent cancers in the world with a high mortality rate is liver cancer. The gold standard for diagnosing liver diseases such cirrhosis, liver cancer, and fulminant hepatic failure is a medical imaging modality like computed tomography (CT), magnetic resonance imaging (MRI), or positron emission tomography (PET) . Among them, CT scans, which have a good signal-to-noise ratio and excellent resolution, are currently the modality most frequently employed for diagnosing and treating liver lesions or tumors. The accurate identification of liver cancer by doctors, together with knowledge of the shape, volume, and location of the lesion, can lead to more effective patient care. Clinicians must manually segment liver lesions on a slice-by-slice basis, which takes time and is error-prone. As a result, the accurate and automatic segmentation of the liver and hepatic lesions is required for computer assisted diagnosis of liver illness and for creating a plan for liver transplant surgery.
For volumetric or morphological analysis, segmentation is the technique of clearly defining an organ of interest on a multi-planar computed tomography (CT) or magnetic resonance imaging (MRI) image . Although many deep learning-based models have been created, segmenting liver lesions is still a popular field of research. Several survey papers on the segmentation of the liver have been published , , . But to the best of our knowledge, there are not many survey publications on the segmentation of liver lesions.
This study carries out a critical analysis of some of the published works related to liver lesion segmentation using deep learning models. The authors compared various deep learning models based on the models proposed, datasets, performance, and disadvantages of each model, and presented some major challenges encountered while segmenting liver lesions. Finally, it was concluded that computer aided liver lesion segmentation is still an open research problem, especially facing small size lesions, for the available techniques have many limitations to be addressed.
2. Liver Lesion Segmentation Models
This study summarizes the recent studies on medical image analysis for the segmentation of liver lesions or tumors. The majority of the research is built on supervised learning algorithms that train a model for a particular task, like liver or liver lesion segmentation, using labeled inputs. The deep learning techniques are an addition to these strategies . The authors would discuss a variety of deep learning models that have been created for segmenting liver lesions or tumors.
A model for automatically segmenting the liver and lesions in abdominal CT images was created by Christ et al.  using cascaded fully convolutional neural networks (CFCNs) and dense 3D conditional random fields (CRFs). Cascaded FCNs were trained in two steps: first, an FCN was trained to segment the liver as a region of interest (ROI), which was then used as input by a second FCN. The second FCN is responsible for segmenting lesions from the anticipated liver ROI from the previous stage. The segmentation results generated by CFCN are then greatly improved in quality using a dense 3D conditional random field, as a step of post processing. Using the dataset 3DIRCAD, CFCN models were trained through two-fold cross-validations on CT scans of the abdomen. The results show that CFCN-based semantic liver and lesion segmentation achieved Dice scores of 94.3% for liver segmentation.
A fully convolutional neural network-based model, known as multi-channel FCN, was put forth by Sun et al.  for the segmentation of liver lesions from CT scans. The model is trained for each CT scan slice and coupled with its high-level features, for each slice of a contrast-enhanced CT image gives some kind of information. Two datasets based on CT images, 3Dircadb and JDRD, were used to evaluate the model. Compared to the earlier model , the suggested MC-FCN delivers improved values for VOE, RVD, ASD, and MSD, according to model performance data.
Given the restricted GPU capacity and training data, Xiao  suggested a 2.5D Deep CNN model that accepts a stack of neighboring slices as input and produces a 2D segmentation map corresponding to the center slice. They created a deep CNN model with 32 layers by utilizing both the long-distance skip connections of UNet  and the short-distance residual connections of ResNet . The proposed Deep CNN model was trained and validated using the LiTS dataset. The intensity values of the input image were reduced to the range of [200, 200] HU in order to remove the unnecessary image features, but no further specific pre-processing was performed. To reduce the overall calculation time, two Deep CNN models were trained. The first was used to construct a rapid but coarse segmentation of the liver, and the second was applied to create a more detailed segmentation map of the liver and liver lesion. The suggested model's residual connections aid in information flow both forward and backward through the network and improve model performance. Using the LiTS dataset, two networks with the same design were trained. The Dice score for this approach was 0.67. Lesion segmentation accuracy might still be better, therefore more advancements are obviously needed.
Due to limitations on the addition of additional layers, fully convolutional networks with VGG-16 architecture find it challenging to learn more discriminative features associated with various classes. Convolutional filters' wide receptive fields in FCNs cause them to provide coarse outputs at lesion edges. ResNet is used by Bi et al.  to accomplish the liver lesion segmentation task, and to get over these limitations. Thanks to the ResNet's residual skip connections between convolutional layers, the issue of training accuracy decay in deeper networks was solved, enabling the insertion of additional layers to learn more discriminative features. In addition, the model was able to define boundaries more precisely by using a unique cascaded-ResNet architecture with multi-scale fusion to gradually learn and infer the borders of the liver as well as liver lesions. The suggested model is trained and tested on the LiTS dataset. To minimize the overall per-pixel loss, the network's weight parameters are iteratively updated via stochastic gradient descent (SGD). The segmentation results were greatest when cascaded-ResNet with multi-scale fusion was used. The Dice score for liver segmentation and the segmentation of liver lesions is improved by 3.94% and 20.13%, respectively, suggesting that the proposed model is more accurate than the VGG-Net-based FCN architecture.
For the ISBI 2017 Liver Tumor Segmentation Challenge, Chlebus et al.  submitted a strategy that uses a 2D U-Net  network and a random forest classifier to automatically segment liver lesions (LiTS). Here, the liver segmentation task is first carried out to focus the network just on ROIs that may contain liver tumors. A trained neural network identifies the lesion candidates from within the liver ROIs, and then a random forest classifier is adopted to further refine them, producing the final lesion segmentation result. Typically, there are two steps: The liver mask is refined in the first step using an ensemble of three orthogonal 2D neural networks built on the U-Net architecture, and in the second step using a 3D U-Net. Using the LiTS dataset for evaluation and training, the suggested architecture obtains a Dice score of 0.65. To convert the numbers to Hounsfield units, DICOM rescale parameters are adopted. As a result, the convolutional networks' padding needed a fill value of -1000HU. After adding a threshold value of 0.5 to the soft-max output of the 3D U-net output, the researchers employed the biggest connected component to obtain the final liver mask.
Vorontsov et al.  put forward a model for the joint segmentation of the liver and liver tumor in CT images. They created a model utilizing two cascaded FCNs and trained it in an end-to-end manner, with 2D axial slices as input. The FCNs in use have short- and long-range skip connections and a U-Net-like structure. Using the LiTS dataset, the suggested model was assessed. Rather than preprocess the dataset, the researchers only performed the minimal post-processing on the predicted results. The suggested model is a one stage model trained in an end-to-end manner. An axial slice serves as the first FCN's input, and the output is sent to a linear classifier to produce a probability map for each pixel containing the liver. Axial slices from the output of the first FCN are imported to the second FCN. The proposed model ended up with a DICE score of 0.661 for tumor segmentation and 0.951 for liver segmentation. However, the segmentation performance can be further enhanced by modifying the model to process the complete CT volume as opposed to only slices.
The majority of the models that have been created for the segmentation of liver tumors use FCNs (2D and 3D) as their main building blocks. But these models have the drawback of not being able to fully utilize spatial information along the third dimension. Besides that, 3D convolutions require high computational costs and GPU memory usage, but the high memory consumption limits the network depth as well as the filter's field of view. A hybrid densely connected UNet (H-DenseUNet) was created by Li et al.  to circumvent the drawbacks of 2D and 3D convolutions. H-DenseUNet is made up of a 2D DenseUNet for precisely determining intra-slice features and a 3D part for incorporating volumetric context in a hierarchical manner for segmenting the liver and tumors. The learning process for H-DenseUNet was designed end to end by the researchers, who used a hybrid feature fusion layer to maximize both the inter- and intra-slice representations. The model achieved a global DICE score of 82.4% on the LiTS dataset and a Dice per case score of 72.2% when tested on the 3DIRCAD dataset.
To segment liver lesions, Chen et al. introduced a 2D segmentation architecture known as Feature Fusion Encoder-Decoder network . The technique employs an attention procedure in which low-level features holding image details are coupled with high-level features conveying semantic information. Furthermore, to make up for the lost details during the up-sampling process, a dense up-sampling convolution is used instead of typical up-sampling procedures. Additionally, residual convolutional blocks are included to further improve the target boundary information. The Dice score for their strategy was 0.766 generally and 0.650 per case. In comparison to 2.5D and 3D models, the proposed architecture shows highly promising results. It is also lightweight and simple to install, with good generalization properties that make it easily transferable to other disciplines.
A pipeline made up of 2D U-Nets with dense connections and a Tversky loss function was suggested by Karsten et al. . Prior to training a densely linked U-Net with a Tversky loss function to segment liver tumors, the researchers trained a regular U-Net model to segment the liver. They employed a cascaded pipeline to shorten the training period. The LiTS dataset was employed for data segmentation. In the LiTS competition, the proposed architecture outperformed the competitors in terms of relative volume difference (RVD), average symmetric surface distance (ASSD), maximum symmetric surface distance (MSSD), and volume overlap error (VOE). This suggests that when tumors are reliably diagnosed, the proposed architecture for segmenting liver lesions functions effectively. If some of the issues, such as high false positive predictions, are fixed, this architecture can be improved even further.
Jiang et al.  created an attention-based hybrid network that incorporates long and short skip connections as well as soft and hard attention mechanisms. The researchers also suggested a cascaded network based on segmenting liver lesions, segmenting the liver, and localizing the liver. Additionally, they developed a focal binary cross entropy loss function to fine-tune the lesion segmentation network and a joint dice loss function to train the liver localization network, producing accurate 3D bounding boxes for the liver. The suggested architecture trained a network by removing 110 examples from the LiTS dataset before comparing it to 117 cases from the clinical dataset and the 3DIRCADb dataset. The results on the test dataset show that, the suggested network is able to segment liver lesions with a Dice score of 0.620.07 and faster convergence.
A cascaded Res-UNet that simultaneously segments the liver and liver lesion was created by Xi et al. . With the proposed cascaded ResUNet, the researchers assessed five distinct loss functions: Weighted Cross Entropy (WCE), Dice Loss (DL), Weighted Dice Loss (WDL), Tversky Loss (TL), and Weighted Tversky Loss (WTL). To segment the liver and liver lesions simultaneously on the CT volume, they then ensembled all of the cascaded ResUNet models that had been trained with five different loss functions. The proposed ensembled model was trained and tested on the LiTS dataset. The proposed ensembled model outperforms the individual model for segmenting the liver and lesions, as per experimental results. For liver segmentation and liver lesion segmentation, it obtained Dice scores of 94.9% and 75.2%, respectively.
To address the problems with traditional UNet, Seo et al.  modified it by introducing object dependent up sampling and altering the residual path and skip connections. The improved UNet  uses an optimal number of pooling operations to draw higher level global features for smaller objects, remove high level features of high-resolution edge information for larger objects, and cease the replication of low-resolution data of features. Using the LiTS dataset, the generated model's performance was evaluated. The model outperformed all others with a Dice score of 89.72%. As compared to traditional UNet, the proposed modified-UNet can operate on edge information and morphologic information of the objects more successfully.
3. Major Challenges
There is a strong demand for precise and automatic liver and tumor segmentation to aid clinicians in diagnosis and treatment planning as liver cancer is one of the leading causes of cancer mortality today.
The majority of researchers noted the following difficulties in segmenting the liver and lesions:
a) Low intensity contrast between the liver and other nearby organs.
b) It is challenging to segment liver tumors since they might vary in size, shape, location, and quantity within a patient.
c) Some tumors lack distinct borders, which limits the effectiveness of segmentation approaches.
d) The majority of CT images have anisotropic dimensions with significant fluctuations along the z-axis direction, which makes segmentation approaches even more difficult.
Figure 1 gives the examples of contrast-enhanced CT scans demonstrating vast dissimilarity of size, shape and position of liver lesion. Note that red regions represent liver and green regions represent liver lesions .
In medical imaging, a major challenge to lesion segmentation is the imbalance between lesion class and non-lesion class, and the imbalance in tumor size, i.e., bigger size tumors dominate smaller ones when multiple tumors appear in a single modality. The class imbalance problem has been addressed by a number of strategies, but the tumor size imbalance problem has not received as much attention. As a result, many of these techniques either fail to partition smaller size tumors or produce less than ideal results. To overcome the challenge of minor lesion segmentation, Li et al.  devised a three-layer curriculum learning technique for deep neural networks. To effectively segment hepatic lesions, Dey and Hong  suggested a cascaded network that incorporates both 2D and 3D CNNs. In this network, a 3D network detects minor lesions that are frequently missed by a 2D segmentation model, while a 2D network segments the liver on a slice-by-slice basis and detects larger lesions. The model receives a Dice score of 68.1% on the LiTS dataset. A loss reweighting strategy was presented by Shirokikh et al.  to improve the capacity of the deep learning network to detect tiny size lesions. Further research is required concerning the segmentation of tiny liver lesions using the proposed networks. It is necessary to build a network that will concentrate on both small and large lesions at the same time and enhance the model's overall performance.
In medical imaging modalities, the anatomy or region that is of interest typically only takes up a small amount of the modality. Given this, the learning process frequently becomes stuck in local minima of the cost function, resulting in a model with outcomes that are heavily biased towards the background rather than the foreground. Foreground zones are hence frequently overlooked or only partially noticed . The efficacy of the segmentation models is impacted when there are several lesions per modality and large size lesions predominate smaller size lesions. As a result, smaller size lesions are frequently ignored.
For the purpose of early cancer stage diagnosis, clinical disease progression tracking, and treatment response evaluation, segmentation of smaller size lesions is crucial. Medical professionals will not be able to correctly diagnose a disease or they may completely miss the diagnosis when the disease is in its early stages if smaller size lesions go untreated. The various liver lesion segmentation methods covered in preceding sections are summarized in Table 1, along with the models suggested, their performance, the datasets used, and any drawbacks.
This work thoroughly analyzes the various deep learning models for segmenting liver lesions, and notes some significant difficulties encountered when segmenting liver lesions. Future research will focus on finding ways to get around the drawbacks of segmenting liver lesions of different sizes.
Christ et al. 
Cascaded CFNs with dense 3D conditional random field (CRFs)
DICE (%) for liver=94.3; VOE (%)=10.7; RVD (%) =-1.4; ASD (mm) =1.5; MSD (mm) =24.0
The method is complex and time-consuming, as it requires an additional post processing step using CRFs, and adopts two CFNs.
Sun et al. 
3DIRCAD and JDRD
Multi-channel FCN to segment liver lesions from multiphase contrast-enhanced CT scans
VOE (%)=15.6 ± 4.3; RVD (%)=5.8±3.5; ASD= 2.0 ± 0.9%
VOE=8.1±4.5%; RVD=1.7 ± 1.0%;
ASD=1.5 ± 0.7%
Deep CNN model with 32 layers that operates in 2.5D
Dice=0.67; VOE=0.45; RVD=0.040; ASSD=6.660; MSSD=57.93
The accuracy of lesion segmentation needs further improvement, and the training time is too long.
Bi et al. 
Cascaded deep residual network with ability to add more layers
Liver Dice=95.90; Lesion Dice=50.01; Liver Jaccard=92.19; Lesion Jaccard=38.79
The method excels in liver segmentation but does not achieve the ideal effect in segmenting liver lesions.
Chlebus et al. 
2D U-Net network with random forest classifier
Accuracy=90%; Dice score =0.65
Vorontsov et al. 
Two cascaded FCNs with U-Net-like structure, having short- and long-range skip connections
Dice (liver segmentation) =0.951; Dice (liver lesion segmentation) =0.661
The model must process input slice by slice, which degrades the performance.
Li et al. 
LiTS and 3DIRCAD
Hybrid densely connected UNet
Dice Score=0.937 ± 0.02; VOE (%) =11.68 ± 4.33; RVD (%) =-0.01 ± 0.05; ASD (mm) =0.58 ± 0.46; RMSD (mm) =1.87 ± 2.33
The model cannot segment small size liver tumors effectively.
Chen et al. 
Feature fusion encoder decoder network with residual blocks
Dice per case=0.650;
Karsten et al. 
2D Tiramisu network with Tversky loss function
Dice Avg=0.57; Dice global=0.66;
The false prediction rate is very high.
Jiang et al. 
LiTS and 3DIRCAD
Attention-based hybrid network
Dice Score=0.62±0.07; VOE (%) =1.354; RVD (%) =0.129; ASSD (mm) =1.074; MSSD (mm) =6.271; RMSD (mm) =1.412
The liver lesion segmentation is inefficient.
Xi et al. 
Ensembled model with cascaded ResUNet
Liver Lesion Segmentation:
Seo et al. 
LiTS and 3DIRCAD
Modified UNet with residual path and object dependent up-sampling
Dice Score=89.72%; VOE=21.93%; RVD=-0.49%
The loss function, MSE, does not adequately capture structure similarity.
The data used to support the research findings are available from the corresponding author upon request.
The authors declare conflicts of interest.