Deep Learning-Based MRI Classification for Early Diagnosis of Alzheimer’s Disease
Abstract:
Alzheimer’s disease (AD), a progressive neurodegenerative disorder characterized by severe cognitive decline, necessitates early and accurate diagnosis to improve patient outcomes. Recent advancements in deep learning (DL), particularly convolutional neural networks (CNNs), have demonstrated significant potential in medical image analysis (MIA). This study presents a robust CNN-based framework for the classification of AD using magnetic resonance imaging (MRI) data. The proposed methodology incorporates contrast stretching for image preprocessing, followed by principal component analysis (PCA) and recursive feature elimination (RFE) for feature selection, to enhance the discriminative power of the model. The framework is designed to classify MRI into four distinct categories: non-demented, very mildly demented, mildly demented, and moderately demented. Experimental validation on a comprehensive dataset reveals that the proposed approach achieves exceptional performance, with a validation accuracy of 97% and a training accuracy of 100%, alongside reduced loss and improved sensitivity. The integration of PCA and RFE is shown to effectively reduce dimensionality while retaining diagnostically critical features, thereby optimizing the model’s efficiency and interpretability. These findings underscore the potential of DL techniques to revolutionize the early detection and diagnosis of AD, offering a powerful tool for clinical decision-making and advancing the field of neuroimaging analysis. The proposed framework not only addresses the challenges of high-dimensional data but also provides a scalable and generalizable solution for the classification of neurodegenerative disorders.
1. Introduction
Doctors can use artificial intelligence (AI) to help them diagnose patients more quickly and accurately. It may predict a disease's risk beforehand, enabling its prevention. To analyze medical data and treat diseases, researchers can employ deep learning (DL) (Helaly et al., 2022). In contrast, medical image analysis (MIA) can be a laborious and complex process. To identify Alzheimer’s disease (AD) early, a DL model was used in this study.
In order to process data and generate patterns for utilization in decision-making, DL (sometimes referred to as deep structured learning or hierarchical learning) is an AI function that mimics how the human brain functions (So et al., 2019). In AI, DL is a subset of machine learning (ML) that includes networks that can learn unsupervisedly from unstructured or unlabelled data. Deep neural learning and deep neural network (DNN) are some names for it.
Working memory comes initially. Maintaining focus and attention while receiving data is its responsibility. Short-term memory (STM) comes next. STM oversees the preservation of information for slightly more than a 24-hour period. At last, for durations longer than a day, all of the events observed are recorded and stored in long-term memory (LTM) (Jiang et al., 2022). Memory, reasoning, and the ability to perform even the most basic tasks have all been severely affected by AD, a degenerative brain disease.
AD is a degenerative disease of the brain and neurological system that worsens over time (Wen et al., 2020). The incidence of AD is rising annually as a result of the global acceleration of the aging process. As the condition worsens, older adults with AD will face a number of brain damage symptoms, including progressive memory loss, mobility issues, a decline in language expression, and cognitive challenges (Logan et al., 2021). Magnetic resonance imaging (MRI) of the brain is typically used to assess this stage of neuropsychiatric symptom development. The progression of AD is seen in Figure 1.

In recent years, research has focused on utilizing ML to diagnose AD from data like MRI. This technology has expedited the medical process and made the task of medical experts easier. The objective in this study is to use a convolutional neural network (CNN) to classify AD images (Ben Ahmed et al., 2015). The most traditional and widely used DL framework is CNN, which is a multilayer neural network. The state of the art (SOTA) method for image classification is CNN with DL (Poloni et al., 2021).
The study aims to address the problem of model accuracy and data sensitivity. Additionally, it uses CNN to categorize four classes of AD, as shown in Figure 2. In that instance, the recursive feature elimination (RFE) approach was applied and principal component analysis (PCA) was incorporated into CNN. In a model, PCA is seen as a commonly utilized technique, whose aim is to reduce dataset dimensions (AbdulAzeem et al., 2021). It enables the model to visualize the data simply and train itself more quickly. In this work, a PCA was used to confirm whether or not the features are independent of one another. The least features were eliminated and new independent features were produced from the previous ones with the support of PCA.

Exploring the best feature in a database is the goal of RFE, which is similar to a greedy optimization technique (Fu’adah et al., 2021). To do this, the model's basic DL algorithm must be fitted, the features must be ranked according to importance, the least important features must be removed, and the model must then be re-fitted.
2. Literature Review
Islam & Zhang (2018) proposed applied ensemble DL models combining CNNs with other architectures. It enhances the performance in complex tasks like image classification, MIA, and more. MRI data was used to diagnose AD. MRI scans from relevant datasets were obtained, providing various imaging modalities such as T1-weighted images, which are crucial for AD diagnosis. High-level features were automatically extracted from the MRI data using a CNN. By recognizing patterns like shrinkage in particular brain regions, the CNN layers were able to recognize AD. To guarantee uniformity throughout the dataset, one of the preprocessing stages involved normalizing the MRI intensity values. PCA helps in reducing the dimensionality of the extracted features. The computational cost was also reduced via PCA; it also emphasized the most relevant features. For classification tasks, the more comprehensive and effective solution was offered by these models, because these models utilize the robustness of several methods.
Basaia et al. (2019) proposed a 3D CNN model trained on structural MRI data. It is considered to be an effective method for evaluating volumetric medical images. Comprehensive data regarding the brain structure were offered by the following activities: disease detection, brain segmentation, and structural MRI scans. The 3D CNN model lowered the computational cost and maintained the spatial data via reducing the dimensionality of the feature maps. To perform tasks like regression or classification, 3D CNNs usually have fully connected layers that use the features after feature extraction. The entire volume of MRI data was analyzed by the 3D CNN model, effectively capturing the spatial relationship among various brain regions. However, the 2D CNN model fails to capture that. Compared to 2D models, the 3D CNN models are effective and attain better classification accuracy in tasks like AD detection, brain tumor classification, and brain segmentation. The problems related to computational demands and data requirements must be managed carefully.
Ortiz et al. (2016) proposed a 3D CNN model with transfer learning (TL) applied to brain MRI data. Through task adaptation, TL uses a model that has already been trained on a large dataset. This leverages the learned features and weights from the pre-trained model, reducing the need for extensive training on the new dataset. While modifying and retraining parts of the pre-trained model on the target dataset (brain MRI), other parts remain fixed. This adapts the model to the specific features of MRI data. For 3D CNNs, TL might involve pre-trained models on large 3D datasets. In practice, pre-trained models for medical images are less common than 2D models, but there are specialized models available for tasks such as brain segmentation. TL in 3D CNNs typically involves extracting features from pre-trained models and fine-tuning the network on MRI data to improve performance for specific tasks. TL helps in achieving higher accuracy by leveraging pre-trained features that capture general image patterns, which are fine-tuned to the specific characteristics of MRI data. It results in improved accuracy, reduced training time, and better feature extraction, though it comes with challenges related to data compatibility, computational demands, model interpretability and robustness in classification.
To classify AD using MRI data, a deep CNN model was created by Tariq et al. (2019). The goal is to create a CNN model capable of accurately classifying MRI into different AD-related classes. In order to build a classifier that can differentiate between different disease states, this usually entails extracting features from MRI scans. The MRI is accepted by the input layer. While 3D CNNs deal with volumetric data, 2D CNNs often handle images as slices or projections. These layers use either 2D or 3D convolutions to extract hierarchical features from MRI. The use of many convolutional layers with varied filter sizes was considered to collect various levels of information. After feature extraction, the gathered features were combined using fully connected layers to determine the final classification. The output layer uses an activation function (AF), such as softmax for multi-class classification (MCC) or sigmoid for binary classification, to produce class probabilities. The objective of this strategy is to give accurate and dependable MRI classification for the diagnosis and comprehension of AD by utilizing cutting-edge approaches and ongoing improvement.
Kong et al. (2022) proposed an applied 3D CNN model on functional magnetic resonance imaging (fMRI) data for AD detection. This method involves leveraging the spatiotemporal information within fMRI volumes to identify patterns associated with the disease. By identifying variations in blood flow, which represent neural activity in various brain areas, fMRI measures brain activity. Unlike structural MRI, which captures the anatomy of the brain, fMRI provides insight into brain function and connectivity. fMRI data consists of 3D volumes over time, making it a 4D dataset. Each voxel represents a small cube of brain tissue, and the signal intensity indicates the level of neural activity. The brain function gets affected by AD. Disruption in neural activity and connectivity are the consequences of AD. The fMRI is employed for detecting these variations. For early diagnosis, this fMRI is considered to be an effective tool. Typically, a softmax AF is used in the final layer to produce class probabilities. The application of this technique has demonstrated significant potential in clinical settings, particularly for early diagnosis and monitoring of disease progression, considering the fact that it requires careful attention to data preparation, model construction, and the challenges of working with high-dimensional, temporal data.
By combining CNN with a recurrent neural network (RNN), Li et al. (2019) showed how to diagnose AD by utilizing the advantages of both architectures for the analysis of neuroimaging data, such as MRI or fMRI. While RNNs, especially long short-term memory (LSTM) networks or gated recurrent units (GRUs), are effective at capturing temporal dependencies, CNNs are good at extracting spatial features from images. The purpose of CNNs is to automatically and adaptively learn feature spatial hierarchies from input images. CNNs are especially good in finding patterns in brain scans in neuroimaging, such as fMRI activation patterns or structural abnormalities in MRI. RNNs are made to identify previous inputs in order to identify patterns in data sequences. In the context of neuroimaging, RNNs can be used to analyze sequences of brain scans over time, capturing temporal dynamics that may indicate the progression of AD. It provides a powerful tool for diagnosing AD by leveraging the spatial and temporal characteristics of brain scans. Improved clinical diabetes mellitus, improved monitoring of disease progression, and earlier and more accurate diagnosis are all possible outcomes of this hybrid approach.
To enhance CNN performance on MRI data, the data augmentation approach was used by Singh et al. (2022). This method decreases overfitting, increases the robustness of the model, and improves CNNs' capacity for generalization. Medical datasets, including MRI scans, are often limited in size due to the difficulty in acquiring and labelling medical images. This limitation can lead to overfitting in CNN models. MRI scans can vary due to different scanning protocols, patient positioning, and anatomical differences. Data augmentation helps CNNs learn to handle this variability effectively. By expanding the variety of the training dataset, enhancing generalization, and decreasing overfitting, data augmentation is a potent method to improve CNN performance on MRI data. By carefully applying spatial, intensity-based, and advanced augmentation methods, models can achieve higher accuracy and robustness, making them more effective for clinical diagnosis and research in the context of AD and other neurological disorders.
By combining preprocessing with feature extraction, Amoroso et al. (2018) used DL models for AD classification using MRI data. Preprocessing, feature extraction, and the use of DL architectures were all integrated into a thorough procedure. Memory loss, cognitive decline, and behavioural abnormalities are symptoms of AD, a degenerative neurological condition. For the condition to be managed, an early and precise diagnosis is essential. A non-invasive imaging method that produces fine-grained structural images of the brain is MRI. These images are valuable for detecting brain atrophy and other abnormalities associated with AD. The main challenges include the complexity of the brain's anatomy, variability in MRI scans, and the need for robust preprocessing and feature extraction methods to maximize the effectiveness of DL models. By addressing the challenges of small sample sizes, high dimensionality, and the need for interpretability, these models have the ability to significantly improve early diagnosis and treatment of AD, ultimately impacting patient outcomes and advancing research in the field of neuroimaging.
In order to improve the diagnosis accuracy, Khvostikov et al. (2018) suggested to utilize DL-based multi-modal fusion, which combines MRI and positron emission tomography (PET) data for AD detection by integrating complementary information from both imaging modalities. MRI and PET provide different types of information—structural and functional. Combining them can lead to a more comprehensive understanding of the disease. By fusing data from both modalities, the model can detect subtle changes in the brain that might not be evident from a single modality, leading to improved diagnostic accuracy, particularly in early stages like mild cognitive impairment (MCI). A hybrid approach that involves both feature- and decision-level fusion was implemented. Features from MRI and PET were first fused, and the combined feature set was then employed for training a DL model, whose output was further refined by an ensemble of decision-level fusion methods. The future goal is to use graph neural networks (GNNs) to model the relationships between different brain regions as captured by MRI and PET, providing a more holistic view of how AD affects brain connectivity. With the ability to improve clinical practice and patient outcomes, it makes use of the advantages of both imaging modalities and DL to offer a greater understanding of the disease.
Using structural MRI data, Jain et al. (2019) created a CNN-based DL framework for the early detection of AD. In order to detect small neurological changes linked to the early stages of AD, this method makes use of CNNs' capacity to automatically extract pertinent features from high-dimensional image data.
Early detection of AD is critical as it enables interventions that can slow disease progression, improve quality of life, and provide time for planning and treatment. Structural magnetic resonance imaging (sMRI) is one of the most informative imaging modalities for capturing anatomical changes in the brain, such as hippocampal atrophy, which are early indicators of AD. The CNN-based framework can be integrated into clinical workflows, providing radiologists and neurologists with an advanced tool for the early detection of AD, potentially leading to earlier interventions. It achieves a high sensitivity with improved accuracy.
3. Methodology
Image classification, a core mechanism in computer vision (CV), has been extensively researched and applied across various domains. Classifier, image feature extraction and selection, and image preprocessing are usually its three main parts (Lu & Weng, 2007). A collection of 10,432 JPEG images of patients with four categories (mildly demented, moderately demented, non-demented, and very mildly demented) was used to implement DL features for the classification of AD. In order to speed up the execution of the DL algorithms, the model was created using the Python programming language and the Keras and TensorFlow libraries. The system was backed by a graphics processing unit (GPU) with NVIDIA.
Figure 3, Figure 4, Figure 5, and Figure 6 illustrate the four classes of the 10,432 images in the dataset, which was made available by Kaggle, for testing purposes.




Getting higher-quality images is the goal of image preprocessing. In this case, the input (IQ) image quality was improved by the application of contrast stretching. In order to preserve the final image's shape without causing damage, the contrast stretching technique was employed in this work. The contrast stretching (Widodo et al., 2016) approach is a subset of the point processing method, which implies that it depends only on the intensity of a single pixel and not on the pixels surrounding it. The image's grey level range is presumed to be between 0 and 255 in this case. The transformation proceeds in a straight line with no alteration to the grey level image generated if the grey level values are $a_1=a_2$ and $b_1=b_2$ (Radha & Tech, 2012). However, the method generates a value if the grey level value is considered to be $a_1 < a_2$ and $b_1 < b_2$. Three functions to calculate contrast stretching are described in Eqs. (1)-(3). The contrast stretching function is given in Figure 7.

For $0 \leq f_i(x, y) < a_1$, then:
For $a_1 \leq f_i(x, y) < a_2$, then:
For $a_2 \leq f_i(x, y)<255$, then:
The preprocessing MATLAB screens are shown in Figure 8.

The retrieved data was normalized to zero mean and unit variance using a conventional scalar function following the preprocessing step. Normalization was conducted in order to eliminate the irregularities in the data that hamper analysis. Eq. (4) provides the normalized matrix of elements $x(i, j)$.
The feature selection approach reduces the dimension size and selects the best features by combining the PCA and RFE methods.
A lot of features in the analysis process often lead to dimension issues being highlighted. The effective way to deal with this problem is through PCA. In basic terms, PCA translates the information in $d$-dimensional space to a $k$-dimensional subspace once $k < d$ and produces a linear combination of initial features (Tonin et al., 2024). The principal components (PCs) are the obtained variables $k$. With the exception of the variance that is already taken into consideration in all of its subsequent components, each PC is addressed towards the maximum variance. Compared to the adhering components, this initial component covers the greater variances. Eq. (5) can be used to determine PCs:
where,
$P C_i$ denotes the $i^{\text {th }}$ PC; $X_i$ represents the original feature $j$; and $a_i$ represents the numerical coefficient for $X_i$.
The most used technique for feature selection is PCA. In this case, it only reduces the dimensionality of the features. However, the model solely operates on the basis of selected features when a feature selection technique is used, and no changes are made (Aker, 2022). PCA is used to initially minimize a feature's dimension, and the RFE feature selection method is used to choose the most significant features.
An example of a feature selection algorithm wrapper is RFE (Chen, 2003). This approach builds models and chooses the best ones based on performance metrics, by using a variety of subsets of input features. The combined PCA and RFE model is shown in Figure 9.

Algorithm 1: Process of RFE |
Step 1: All predictors are used to optimize the model on the training set. Step 2: Model performance computation. Step 3: Calculate the importance-based ranking of variables for each subset size. Step 4: Preserve the most crucial variables. Step 5: Train the model on the training set using predictors. Step 6: Determine the model's performance. Step 7: The computation of the performance profile is complete. Step 8: Choose an appropriate amount of predictors. Step 9: Make use of the model that best fits the situation. |
When both the cortical and subcortical regions' sMRI features were retrieved, the normalization and feature selection stages were applied to the composite features from both regions (Gupta et al., 2019). It can be utilized to distinguish AD from other groups, once the features have been chosen. This method is demonstrated in Figure 10. CNN is suggested to complete the classification task.

In mathematics, convolution is a crucial analytical procedure. This mathematical operator takes two functions, $f$ and $g$, and creates a third function. When functions and $f$ and $g$ are reversed or translated, these functions show the area of overlap between them. Eq. (6) often defines its calculation.
Eq. (7) contains its integral form as follows:
A digital image, represented by the symbol $f(x, y)$, can be considered as a discrete function of a 2D space in image processing. Eq. (8) can be used to represent the output image $z(x, y)$, assuming that a 2D convolution function $g(x, y)$.
Thus, following the feature extraction and selection preprocessing, the convolution operation can be utilized to extract the image features. Similar to this, in DL applications, the input is a high-dimensional array of 3 × image width × image length when it is a color image with RGB channels that is made up of individual pixels. The learning algorithm defines the kernel as the accounting (Jogin et al., 2018). The kernel is referred to as the "convolution kernel" in CNN. Another high-dimensional array is the computational parameter. Eq. (9) can therefore be used to represent the analogous convolution operation when 2D images are input.
Eq. (10) contains the integral form as follows:
Given a convolution kernel of size $m \times n$, Eq. (11) looks like as follows:
In order to indicate the size of the convolution kernels $m$ and $n$, $f$ stands for the input image $G$. Algorithms frequently use convolution as a matrix product. Assume the convolution kernel is $n \times n$ in size and the image is $M \times M$. This is analogous to extracting the $n \times n$ image region and expressing it as an $n \times n$ column vector. Each image region of $n \times n$ size is multiplied by the convolution kernel during computation (Vasudevan et al., 2017).
The CNN-based image classification model is shown below:
a) Input: The training set consists of a set of $N$ images, each labelled with one of the $K$ classification tags.
b) Learning: This step involves learning the precise features of each class using the training set. This stage is commonly referred to as learning a model or a training classifier.
c) Evaluation: When comparing the classifier's predicted labels with the image's actual feature vectors, it is evident that the classifier's predicted labels are consistent with the image's true classification feature vectors, which is a good thing, and the more such cases, the better. The classifier is used to evaluate the classifiers' quality and predict the classification feature vector of images that hasn't been examined. Figure 11 shows the classification output.

A classifier must be used to classify the feature vector after it has been extracted from the image, which can then be described as a fixed-length vector. From input to output, a typical CNN consists of the following layers: input layer, convolutional layer, activation layer, pool layer, fully connected layer, and final output layer (Yamashita et al., 2018). The continuous convolution-pool structure decodes, deduces, converges, and maps the feature signals of the original data to the hidden layer feature space while the CNN layer creates the relationships between various computational neural nodes and transfers input information layer by layer (Shocher et al., 2020). The extracted feature is then used by the fully connected layer to classify and output.
Therefore, AD is diagnosed by applying CNN and the images are classified based on the proposed model by performing techniques for the image classification. This can result in the high accuracy of images without changing the shape of the image and improved sensitivity for positive findings for patients with the disease.
4. Results and Discussion
The four classes of AD, i.e., mildly demented, moderately demented, non-demented, and very mildly demented, were identified using the model. The anomalies were removed in feature extraction. It is helpful for clinicians to make faster and easier diagnoses of the disease. It resulted in improved accuracy and sensitivity. The accuracy is based on the training and validation. Training accounts for 70% of the data evaluation, while validation accounts for 30%. The results include 97% validating accuracy, 0.0832 validating loss, 0.0012 training loss, and 100% training accuracy.
The evaluation metrics are as follows:
Accuracy is a metric for evaluating classification models that measures how well a model performs across all classes by calculating the ratio of correct predictions to total predictions. Accuracy is represented in Eq. (12).
The ability of a diagnostic test to accurately determine whether a person has a disease or not is known as sensitivity. It is a test's capacity to detect positive results in individuals who have a condition. Eq. (13) represents sensitivity.
Figure 12 shows the training accuracy, which compares the accuracy of the other classification system (OCS) algorithm and the CNN algorithm based on image classification. The training accuracy of CNN is 100%. This training accuracy is reached in 19 epochs for the training dataset.

Figure 13 shows the validation accuracy, which compares the accuracy of the OCS algorithm and the CNN algorithm based on image classification. The validation accuracy of the CNN is 97%. This validation accuracy is reached in 19 epochs for the training dataset.

Based on image classification, Figure 14 compares the training loss of the CNN and OCS algorithms. By evaluating the model's inaccuracy on the training set, training loss is a metric that assesses how well a model fits training data. CNN has a training loss of 0.0012, which is lower than that of the OCS algorithm.

Based on image classification, Figure 15 shows the validation loss that contrasts the loss of the CNN and OCS algorithms. A DL model's performance on the validation set is evaluated using a metric called validation loss. CNN has a validation loss of 0.0832, which is lower than that of the OCS method.

Figure 16 compares the performance of two methods, OCS and CNN, in terms of sensitivity across different numbers of iterations. The results show that the CNN method consistently achieves higher sensitivity than OCS, with a faster and smoother growth trend. In contrast, OCS shows a slower increase with noticeable fluctuations. Ultimately, CNN nearly reaches 100% sensitivity, while OCS remains significantly lower. These findings indicate the superior accuracy and efficiency of CNN compared to OCS in this experiment.

5. Conclusions
The image classification for AD by CNN based on DL was proposed in this study. This methodology includes preprocessing, feature extraction, feature selection and classification. In the preprocessing technique, the contrast stretching was used to make the shape of the image unchanged as the original image. After preprocessing, feature extraction was used to remove the anomalies in the data. The integration of PCA and RFE was employed to reduce the dimension of the size of the image and select the best features. Finally, CNN was implemented to diagnose AD early through image classification. This classification improved the accuracy of the dataset and it also resulted in high sensitivity. The accuracy was based on the training and validation. The data evaluation consists of training (70%) and validation (30%). Additionally, the validation loss and training loss were attained. The expected increase was attained for both training and validation accuracy. The resulting accuracy helps in image classification without changing the original dataset in the diagnosis of AD. The sensitivity of image classification predicts how well the patient is affected by the disease.
The data used to support the research findings are available from the corresponding author upon request.
The authors declare no conflict of interest.
