Automated Identification of Insect Pests: A Deep Transfer Learning Approach Using ResNet
In the realm of agriculture, crop yields of fundamental cereals such as rice, wheat, maize, soybeans, and sugarcane are adversely impacted by insect pest invasions, leading to significant reductions in agricultural output. Traditional manual identification of these pests is labor-intensive and time-consuming, underscoring the necessity for an automated early detection and classification system. Recent advancements in machine learning, particularly deep learning, have provided robust methodologies for the classification and detection of a diverse array of insect infestations in crop fields. However, inaccuracies in pest classification could inadvertently precipitate the use of inappropriate pesticides, further endangering both agricultural yields and the surrounding ecosystems. In light of this, the efficacy of nine distinct pre-trained deep learning algorithms was evaluated to discern their capability in the accurate detection and classification of insect pests. This assessment utilized two prevalent datasets, comprising ten pest classes of varied sizes. Among the transfer learning techniques scrutinized, adaptations of ResNet-50 and ResNet-101 were deployed. It was observed that ResNet-50, when employed in a transfer learning paradigm, achieved an exemplary classification accuracy of 99.40% in the detection of agricultural pests. Such a high level of precision represents a significant advancement in the field of precision agriculture.
In response to the projected escalation in global population to an estimated 10 billion by the mid-21st century, it has been recognized that a substantial increase in food production is imperative , , . Crop afflictions caused by various pests lead to diseases and destruction, which in turn result in diminished yields , . The deployment of chemical pesticides has been the conventional response to pest management, albeit with detrimental consequences for human health and environmental integrity. Consequently, agricultural scientists globally have advocated for integrated pest management (IPM) strategies, a testament to the collaborative effort in seeking alternatives to chemical pesticides , . Pest recognition systems, underpinned by computer vision and machine learning, have emerged as pivotal in the early detection and management of pests within agricultural, forestry, and wider ecological domains. These systems are instrumental in enabling timely interventions, thus ameliorating pest-induced damages. The role of pest recognition systems is crucial in fostering sustainable agricultural practices and in the stewardship of ecosystems, as they facilitate more precise and environmentally considerate pest control measures essential for food security and the reduction of pesticide reliance.
Research into pest recognition systems is burgeoning, propelled by the imperative for sustainable pest management solutions. This research encompasses the development of sophisticated machine learning algorithms and the implementation of sensor technologies. The traditional method of manual identification is fraught with issues such as human error, subjectivity, inefficiency, limited scalability, bias, inconsistency, high costs, and intensive resource requirements. These challenges underscore the necessity for automated processes, leveraging advancements in machine learning, computer vision, and natural language processing to augment efficacy, reduce error margins, and manage voluminous data sets with enhanced proficiency.
A myriad of autonomous pest recognition systems leveraging machine learning algorithms have been developed , . In a notable study, Wu et al.  compiled an extensive insect dataset, IP102, which encompasses 102 classes and 75,000 images, and their evaluation employed various machine learning and deep learning methodologies. Concurrently, Turkoglu et al.  presented a multi-modal approach, both individual and hybrid, tailored for the detection of diseases and pests in apple plants, utilizing an integrated model combining a pre-trained convolutional neural network with Long Short-Term Memory (LSTM) networks , .
In related efforts, Kasinathan et al.  introduced a machine learning-based insect identification and classification method, exploiting Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Artificial Neural Networks (ANN). This approach involved feature extraction from the Wang and Xie datasets and incorporated the GrabCut algorithm for insect detection, thus bypassing the need for manual extraction.
The utilization of transfer learning methods, where deep learning models pre-trained on extensive datasets are fine-tuned to smaller, target datasets, is well documented . This technique affords several advantages, such as time and resource savings, and improved performance, particularly when labeled data for the new task is scarce . Among the plethora of transfer learning strategies, the selection is contingent upon the specifics of the task and the pre-trained models available.
This study proposes an automated method for insect classification into ten categories without necessitating manual segmentation or feature extraction from images in the pre-processing phase, thereby enhancing automation compared to some extant methods , .
The primary contributions of this study are articulated as follows: Firstly, a novel system utilizing deep learning for the automated identification and classification of insects into ten categories is presented. Secondly, an analysis and validation of the transfer learning concept are provided, specifically for the ResNet model.
The subsequent structure of this document is as follows: Section 2 elucidates the proposed methodology and the deep neural networks engaged in this study. Section 3 delineates the experimental procedures, datasets employed, and the results thereof. Concluding remarks are presented in Section 4.
2. Material and Methods
In the realm of agricultural, forestry, and environmental management, the deployment of pest recognition systems is instrumental. These systems harness advanced technological paradigms to accurately identify and manage pest populations, thus affording a plethora of advantages to agriculturalists, land stewards, and ecosystems. The ensuing are salient benefits derived from the application of pest recognition systems:
(1) Early detection capabilities facilitate the identification of pests at incipient stages, often antecedent to observable damage to flora or ecosystems. Such preemptive discernment permits timely interventions, mitigating the propagation of pests and curtailing the magnitude of potential harm .
(2) Precision in pest control is augmented through the exactitude provided by these systems, enabling the implementation of pest management measures with greater specificity. Consequently, this obviates the need for indiscriminate pesticide applications, fostering an environmentally sustainable approach to pest control .
(3) The enhanced accuracy of pest detection and subsequent interventions culminate in the reduction of pesticide utilisation . This has a cascade of benefits, encompassing monetary savings for agricultural practitioners, a diminution of environmental contaminants, and the safeguarding of agricultural workers and consumers .
(4) The efficacy of pest recognition systems is instrumental in safeguarding crop yields by mitigating pest-induced detriments. This safeguarding is pivotal for the sustenance of food security and the economic resilience of agricultural pursuits.
(5) The alignment of pest recognition systems with sustainable agricultural principles is marked, promoting practices that uphold environmental stewardship and economic sustainability over the long term.
(6) The systems proffer data-driven insights, which serve as a fulcrum for informed decision-making in pest management strategies .
(7) The optimization of pest control measures facilitated by these systems translates into substantial cost efficiencies for agriculturalists and landowners alike .
(8) Resistance to pesticides is a growing concern, and targeted pest control strategies made viable through these systems can play a significant role in managing this resistance .
(9) The health of ecosystems, particularly within the domains of forestry and natural resource management, is bolstered through the mitigation of invasive species and detrimental pests, contributing to ecological resilience.
(10) Remote monitoring capabilities inherent in some pest recognition systems, through the integration of drones or IoT devices, afford expansive area surveillance with minimal manual intervention .
(11) Adaptability and scalability are intrinsic to these systems, enabling their application across diverse pest types and geographic locales. This flexibility is essential in the face of emergent pest threats or evolutionary changes within extant populations.
(12) Centralization and dissemination of pest-related data facilitate collaborative pest management endeavours, unifying researchers, agriculturists, and consortia in the pursuit of efficacious control strategies .
Convolutional Neural Networks (CNNs) have emerged as a potent tool in the identification and classification of pests, with proven efficacy across agricultural, forestry, and environmental monitoring applications. The intrinsic design of CNNs renders them particularly adept at discerning and extrapolating salient features from image-based data, thereby serving as a cornerstone technology in pest recognition tasks .
The dataset proposed by Deng et al. , comprising ten disparate pest categories depicted through photographic images, has been explored and harnessed within these studies , . Visual exemplars from each pest category are depicted in Figure 1, while Table 1 delineates the categorical data of the 'Small Dataset' as utilised for pest classification.
Number of Images
Spodoptera exigua larva
Gypsy moth larva
Laspeyresia pomonella larva
Total number of images
The application of transfer learning to the classification of entomological imagery is elucidated in Figure 2. Within this framework, the latent convolutional layers of pre-existing neural networks are meticulously adjusted to facilitate a categorization into ten distinct classes. It is posited that this nuanced calibration of transfer learning, effected through the modification of terminal layers within the network, augments the network's discriminative power, thereby enhancing performance.
Transfer learning emerges as a pivotal technique in the development of pest recognition systems, particularly in scenarios marked by a scarcity of labeled data specific to the task at hand. The process is delineated as follows: Initially, the task is defined with precision. Subsequently, a collection of labeled images embodying the pests of interest is compiled, with annotations encompassing both the presence and absence of pests, and, where pertinent, the exact species. This dataset is anticipated to be representative, encompassing various life stages and environmental contexts , . A model with prior training on a comprehensive dataset, such as ResNet 50 or ResNet101 utilizing the ImageNet database, is selected. The data are then subjected to preparatory processing, which involves resizing, normalizing, and augmenting the images to conform to the expected input schema of the chosen model. Techniques for data augmentation may include, but are not limited to, random cropping, rotation, and flipping .
Alterations are made to the pre-trained model to tailor it to the specific task of pest recognition. The model is then trained on the labeled pest dataset, applying transfer learning principles by initializing the model with pre-trained weights and refining these on the dataset in question . Monitoring of the training sequence is conducted, employing strategies such as early stopping to mitigate overfitting, with hyperparameters being adjusted accordingly. Model performance is evaluated on a distinct validation set, with hyperparameters fine-tuned to optimize outcomes based on the validation feedback. A comprehensive assessment of the trained model is performed using a dedicated test dataset to determine the model's generalization capability. Metrics of accuracy and reliability are computed to gauge model performance , . Should the model satisfy predefined performance thresholds, it is then deployed for real-time or batch inference within the pest recognition framework, potentially integrating with remote sensing technologies such as drones or stationary cameras.
The principle of transfer learning leverages the knowledge acquired from expansive image datasets by pre-trained models, applying it to the specialized domain of pest detection and classification . This approach holds the potential to markedly diminish the volume of labeled data required to construct an efficacious pest recognition system.
In the study presented, transfer learning has been utilized within deep neural networks to facilitate the classification of insect pests. Transfer learning is characterized as a strategy whereby knowledge harnessed by a pre-established model is repurposed to solve analogous problems. The training of deep neural networks traditionally necessitates extensive datasets, considerable computational time, and processing power.
It is posited that the integration of transfer learning, particularly with pre-trained deep neural networks, constitutes an advanced, cost-effective alternative for classification challenges in scenarios marked by data and processing resource scarcity. The selection of an appropriate pre-trained neural network model is pivotal within the transfer learning paradigm. This critical choice is contingent upon the similarity of the existing problem to the task for which the model was initially trained. A heightened risk of overfitting is posited when the source dataset employed for training exhibits significant congruence in size and composition with the target dataset.
Within the domain of convolutional neural network architectures, the ResNet family stands as a prominent contribution, as introduced by He et al. . The moniker "ResNet" is derived from "Residual Network," a term that encapsulates the network's pioneering innovation: the integration of residual blocks. These blocks are instrumental in enabling the training of profoundly deep neural networks. The ResNet architecture is presented in two variants, namely ResNet-50 and ResNet-101, the numerals indicative of the layers' depth . Both models function akin to standard deep networks while additionally featuring identity mapping capabilities. It is through this innovation that ResNet architectures address and mitigate the issue of vanishing gradients, with ResNet-50 boasting an architecture that scales from 34 to an expansive 152 layers , . Figure 3 delineates the ILSVRC 2015 classification task's winning architecture, a testament to the architecture's efficacy .
The deeper variant, ResNet-101, extends to 101 layers, incorporating an increased number of residual blocks to capture more nuanced patterns and features within the dataset. Its application is deemed appropriate when tasks necessitate heightened accuracy or intricate feature extraction, such as image classification and object recognition. Inherently, the added depth translates to a greater parameter count, with ResNet-101 possessing approximately 44.5 million parameters, rendering it a more potent albeit computationally demanding model.
The schematic of a residual block is depicted in Figure 3. It is observed that for applications such as fault diagnosis, ResNet-50, conceptualized as a TCNN, is employed, leveraging its strength as a feature extractor post-training on the ImageNet dataset . The amalgamation of convolutional neural networks with transfer learning is demonstrated as a method for fault diagnostics. Moreover, the predictive capacity of ResNet-50 in the early diagnosis of Alzheimer's disease (AD) has been explored . Utilizing magnetic resonance imaging (MRI) scans, ResNet-50 facilitates multi-class classification to ascertain the presence and intensity of clinical dementia ratings (CDR). This machine learning approach has demonstrated high precision in classifying AD, suggesting potential utility in the preemptive identification of AD patients prior to clinical evaluation.
The ResNet-101 architecture represents an advancement in the field of deep convolutional neural networks, encompassing a comprehensive structure that addresses a multitude of computer vision tasks. Characterised by its depth, with an assemblage of 101 layers, ResNet-101 extends beyond its predecessor, ResNet-50, by facilitating enhanced feature extraction through additional layers. The architecture is delineated as follows: Initially, an input layer receives an image comprising typically three colour channels, such as red, green, and blue (RGB). Subsequent to the input layer, a series of convolutional layers are employed, applying a multitude of convolutional filters to extract low-level and mid-level features, encompassing edges, textures, and shapes. At the heart of ResNet-101 lies the implementation of residual blocks, which are pivotal in mitigating the vanishing gradient problem, thereby enabling the training of deep networks. These blocks consist of convolutional layers, batch normalization, and ReLU activation functions, with the addition of skip connections that perform elementwise addition of the input from a preceding layer to the output of the block. Furthermore, ResNet-101 incorporates two distinct types of residual blocks: identity blocks, which maintain input and output dimensions, and projection blocks, which modify dimensions via a 1×1 convolutional layer. Pooling layers are interspersed following the residual blocks, typically adopting average pooling to diminish spatial dimensions while preserving salient features. As the network progresses towards its conclusion, fully connected layers are utilised for classification, culminating in an output layer where a softmax activation function articulates class probabilities for multi-class classification tasks.
The ResNet-101 architecture has been demonstrated to be a flexible construct, extensively applied across diverse computer vision domains. Its utility spans several key applications as follows:
(1) Image classification. ResNet-101 is often selected for image classification endeavors. Its architectural depth enables the intricate capture of patterns within images, a process critical for accurate categorization into predefined classes.
(2) Object detection. The capacity of ResNet-101 to serve as a potent feature extractor has been validated in object detection frameworks such as Faster R-CNN and Mask R-CNN. Here, it aids in the pinpointing and classification of objects across still images and video sequences.
(3) Semantic segmentation. The architecture is actively employed in semantic segmentation tasks, with its depth providing the means to execute detailed classification of individual pixels into specified categories.
(4) Transfer learning: The adaptability of ResNet-101 is particularly evident in transfer learning scenarios. Models pretrained on comprehensive datasets like ImageNet present an invaluable foundation for a multitude of vision-based projects, where fine-tuning on smaller datasets can elicit commendable outcomes despite limited sample volumes.
(5) Fine-grained classification. Tasks requiring discernment between closely related categories within a larger class benefit from the depth of ResNet-101, which facilitates nuanced differentiation.
(6) Medical image analysis. Its application has been extended to medical image analysis, addressing critical tasks such as disease detection, organ segmentation, and pathology classification.
(7) Scene recognition. Employed in the recognition and classification of scenes within images, ResNet-101 supports applications in autonomous navigation and content tagging.
In sum, the depth and structural ingenuity of ResNet-101, featuring residual blocks and skip connections, render it a robust mechanism for complex feature extraction challenges. This has solidified its status as a cornerstone in the computer vision research community and operational domain.
3. Result and Discussion
In the current investigation, the efficacy of transfer learning in classifying insect pest images from the Deng dataset was evaluated across nine distinct deep neural network architectures. Transfer learning, coupled with fine-tuning strategies, was employed to mitigate the risk of overfitting, a frequent challenge in machine learning tasks with limited data. Examination of the classification outcomes, as detailed in Table 2, indicates that with a diminutive dataset, the ResNet-50 model exhibited superior performance, achieving an accuracy of 99.40% and a precision of 99.10%. In comparison, the ResNet-101 model attained an accuracy of 97.63% and a precision of 97.69%.
The computational environment deployed for model training comprised an Nvidia GTX3060 Super GPU accelerator and an AMD Ryzen 7 3700X CPU, featuring an 8-core processor with 32GB of DDR4-3200 memory. A confusion matrix for the highest-performing neural network, ResNet-50, is illustrated in Figure 4. For comprehensive performance evaluation of the deep neural networks, metrics including accuracy, precision, recall, F1-score, sensitivity, and specificity were calculated. These metrics offer a multi-dimensional perspective on model efficacy, facilitating a nuanced analysis of the nine transfer learning models. Eqs. (1)-(4) provide the computational framework for these metrics, which are integral to validating the performance of the aforementioned models . Classification accuracy is defined as the ratio of correct predictions to the total number of datasets.
Precision, denoted as positive predictive values, reflects the proportion of true positive predictions in all positive classes.
Recall measures the fraction of true positives identified out of all actual positives.
The F1-score represents the harmonic mean of precision and recall.
In this section, sensitivity and specificity were meticulously calculated. Sensitivity, or recall, refers to the proportion of correctly identified positive instances relative to the total number of actual positive instances. A perfect sensitivity score is 1.0, indicating flawless prediction, whereas a score of 0.0 signifies complete misclassification. Specificity, similarly, is determined by the ratio of true negative predictions to the total number of negative instances, with 1.0 being the ideal and 0.0 the antithesis of accurate prediction.
Nanni et al. 
Small (Deng et al. )
Deng et al. 
Small (Deng et al. )
Wang et al. 
Small (Deng et al. )
Small (Deng et al. )
Table 3 describes the comparative analysis with preceding state-of-the-art research. When juxtaposed with preceding benchmarks in the field, the method delineated herein outstripped the performance of the counterparts, achieving an accuracy pinnacle of 99.40%. This is followed by the work of Nanni et al. , which attained a commendable 95.52% accuracy. The adoption of ResNet models, particularly ResNet-50, in deep learning and computer vision has been marked by numerous advantages. A synthesis of the merits reveals the following:
(1) Facilitation of deep network training. ResNet models incorporate skip connections, or residual blocks, which have been shown to alleviate the vanishing gradient issue. This architectural feature permits the training of networks with an extensive number of layers, thus averting performance degradation despite increased depth.
(2) Enhanced training efficiency. It has been observed that ResNet architectures often exhibit accelerated convergence during the training phase when contrasted with their predecessors. The skip connections facilitate easier learning of identity mappings, culminating in a reduced number of iterations required to reach optimal performance.
(3) Gradient flow optimisation. The integration of skip connections in ResNet models underpins a robust gradient signal across the network. Such an enhancement in gradient flow contributes to rapid convergence and stabilises the training of deeply layered networks.
(4) Benchmark performance. Models such as ResNet-101 and ResNet-152 have consistently delivered exemplary performance on a variety of computer vision benchmarks. Their proficiency encompasses tasks ranging from image classification to semantic segmentation.
(5) Versatile application. The utility of ResNet models spans a broad spectrum of computer vision tasks, evidencing their adaptability to varying challenges including but not limited to classification, detection, and segmentation. This versatility extends to transfer learning applications, where models pre-trained on extensive datasets prove beneficial.
(6) Potential of transfer learning. The availability of pre-trained ResNet models on datasets such as ImageNet facilitates their use in transfer learning contexts. Such utilization allows for a reduction in training duration while maintaining commendable results in scenarios characterised by limited data.
(7) Generalisation capability. Notably, ResNet models have demonstrated an impressive capacity to generalise across diverse datasets and contexts, affirming their applicability in a multitude of real-world scenarios.
(8) Architectural interpretability. The architecture of ResNet models, owing to the presence of skip connections, presents an enhanced level of interpretability. Such a structure permits a more coherent analysis of information flow and feature representation across different layers.
(9) Community and resource availability. The widespread adoption of ResNet models has fostered a substantial user community and a wealth of resources, ranging from pre-trained models to extensive literature, thereby easing the integration and utilization of these models in both research and practical applications.
(10) Discriminative feature learning. Deep feature representations learned by ResNet models are distinguished by their discriminative and transferable qualities, proving valuable for downstream tasks irrespective of the initial training objective.
In the landscape of computer vision, ResNet architectures have been discerned to offer a robust and potent framework, as evidenced by their extensive deployment across varied tasks within the domain. The profundity of their architectural design, coupled with the facilitation of training, has been observed to foster strong gradient flow and perpetuate a consistent performance that aligns with the apex of contemporary standards. These attributes have been pivotal in cementing the role of ResNet models as a foundational element in the advancement of deep learning methodologies and their practical applications within the sphere of computer vision.
The study delineated herein has leveraged the ResNet-50 and ResNet-101 frameworks to evaluate their efficacy in the identification, detection, and classification of insect pests. The application of ResNet in pest recognition has been substantiated as an efficacious methodology, particularly when utilizing pre-trained models in conjunction with limited labeled data sets. The objective pursued was to augment agricultural productivity by facilitating an automated, transfer learning-based mechanism for the prompt and effective detection of insect pests.
Analyses performed have revealed that the ResNet-50 transfer learning algorithm exhibits remarkable proficiency in the detection of agricultural pests, culminating in an unprecedented accuracy rate of 99.40%, thereby surpassing other transfer learning algorithms in performance metrics. This represents a significant stride in precision agriculture technologies. Prospective studies are advocated to further probe the potential of transfer learning within deep neural networks, with an emphasis on the minimization of computational time complexity inherent in the identification of insect pests. Such inquiries are anticipated to refine the efficiency and applicability of transfer learning algorithms in real-world agricultural settings.
The data used to support the research findings are available from the corresponding author upon request.
The author would like to thank all colleagues from Satya Wacana Christian University, Indonesia, and all involved in this research.
The authors declare no conflict of interest.