A Low-Cost YOLOv5-Based System for Automated Classification of Maize Seed Translucency
Abstract:
The physical quality of seeds is a critical determinant of sorting efficiency and crop productivity, yet conventional assessment approaches are often labor-intensive, invasive, and time-consuming. To address these limitations, computer vision-based methods have been increasingly adopted; however, most existing techniques rely primarily on reflected visible light, thereby capturing only surface-level features and limiting the detection of internal defects. In this study, a low-cost imaging system integrating both reflection and transmission of visible light was developed to enhance the characterization of maize seed translucency. By enabling simultaneous acquisition of information from the two principal faces of white maize seeds, a more comprehensive representation of both external morphology and internal structural variations was achieved. A comparative analysis was conducted between the conventional reflection-based method and the proposed imaging approach, with correlation coefficients between seed faces determined as 0.62 and 0.84, respectively, indicating a substantial improvement in feature consistency and information richness. A dedicated dataset was subsequently constructed using both imaging techniques and employed to train a YOLOv5s-based detection model over 200 epochs. The classification performance demonstrated a marked enhancement, with the proposed method achieving an accuracy of 93.07%, compared to 81.5% obtained using the conventional approach. Furthermore, real-time detection capability was validated through the implementation of the optimized imaging system, in which improved inference stability and robustness were achieved under practical operating conditions. The results indicate that the integration of transmission with reflection imaging provides a cost-effective and reliable solution for non-destructive seed quality assessment, offering significant potential for scalable deployment in agricultural sorting systems.
1. Introduction
Maize is the most widely grown grain in the world [1]. It plays a significant role in human and animal nutrition [2]. Along with rice, it is one of the most selected cereals as a niche crop in the National Development Strategy 2020--2030 (SND30) in Cameroon [3]. Therefore, its importance for the nation's food security is justified. Between 2018 and 2019, the Ministry of Agriculture and Rural Development (MINADER) conducted a survey in five regions of Cameroon (Adamawa, East Region, Far North Region, North Region, and West Region). This survey revealed that maize production declined from 1,429,813.9 tonnes to 1,352,985.9 tonnes [4], representing a decrease of 5.3\%. This decline in yield was also accompanied by a reduction in cultivated land. An increase in plot productivity can partially offset the yield losses associated with these land restrictions. Seed quality is a fundamental factor that affects crop productivity.
For farmers and small producers, quality control is based on physical appearance and this is performed manually. However, manual sorting is tedious, time-consuming, and sometimes a source of disagreements [5]. As a result, there has been a global trend towards the development of rapid, non-destructive, and effective techniques for assessing seed quality [6]. Researchers have used artificial intelligence through supervised learning and imaging techniques to classify grain images [7]. They have predicted grain quality indicators from their digital or spectral images. Several studies have been conducted to detect grains in an image processing analysis based on their quality. They mainly use computer vision, hyperspectral imaging, and spectroscopic techniques [6]. This provides efficient quality control and detection based on internal, external, physical, biological, and chemical parameters [8]. Most of these technologies are technically and economically inaccessible in rural areas of developing countries. Among these, computer vision is the most accessible in rural areas.
Several studies have combined computer vision, Red–Green–Blue (RGB) imaging, and deep learning techniques. Huang et al. [9] used GoogLeNet to classify maize seeds into five categories: sound, discolored, broken, moldy, and insect-damaged, achieving an accuracy of 95%. Kundu et al. [10] performed detection on a mixture of maize and millet using YOLOv5. The dataset was augmented using cropping, translation, and rotation techniques. However, grain translucency was not considered. The detection classes included sound maize, defective maize, millet, and maize-millet clusters. The model was trained for 300 epochs, achieving both accuracy and precision of 99%. Nagar et al. [11] conducted maize seed classification using five convolutional neural network models (ResNet18, SqueezeNet, ResNet50, WideResNet50, and MobileNet) across the following classes: pure, discolored, broken, and insect-damaged. The dataset was augmented using generative adversarial networks and batch active learning techniques to balance class sizes. The accuracy obtained without data augmentation was 71%, increasing to 79% with augmentation. The testing time was 682.02 seconds for 1,000 images. Koeshardianto et al. [2] performed maize grain classification using ResNet152v2 across four classes: sound, broken, discolored, and insect-damaged. Training was conducted over 25 epochs with a batch size of 512, resulting in an accuracy of 65% and a precision of 66%. Li et al. [12] utilized the translucency of yellow maize seeds to detect fractures using an efficient residual bilinear convolutional neural network combined with the discrete wavelet transform. Their results showed that combining images acquired with and without translucency yields better performance than using either type independently. The reported accuracy was 98.1%. Other studies have employed spectral imaging and deep learning techniques to inspect the physical quality of maize seeds. Wang et al. [13] performed maize grain quality detection using the watershed algorithm and a dual-path convolutional neural network. Segmentation was applied to isolate high- and low-quality grains in the training set images and to localize them in the test set images. The dataset consisted of RGB and near-infrared images acquired without light transmission. The method achieved an accuracy of 95.63% and an F1-score of 95.46%. This combination of techniques demonstrates improved performance for field applications. Chen et al. [14] conducted the detection of yellow maize seeds using YOLOv8 based on the presence or absence of fractures. Spectral imaging with low-dose X-ray transmission was employed. However, the low translucency of maize seeds reduced the effectiveness of processing the acquired images. The reported accuracy and precision were 99.66% and 99.87%, respectively.
Previous studies using RGB imaging have exploited maize seed translucency to detect only internal and/or external fractures. Other spectral imaging techniques also utilize seed translucency to assess overall physical quality, including categories such as sound, moldy, broken, insect-damaged, and/or discolored. These methods are generally more costly, partly due to the equipment operating outside the visible light spectrum.
Some authors have combined RGB and near-infrared imaging to evaluate seed quality; however, this approach increases the prototype cost. In contrast, the present study employs visible-light translucency to determine the overall physical quality of maize seeds using computer vision, RGB imaging, and deep learning. The developed device is easily accessible in rural areas due to the availability of the materials and equipment used. Unlike previous studies relying solely on light transmission to detect fractures, this work does not limit seed quality assessment to the presence of cracks. Instead, seeds are classified into three categories: sound, discolored, and insect-damaged.
2. Materials and Methods
The designed image-capture device is inspired by the black box (Figure 1) used by Wang et al. [13] for computer vision with light transmission. It consists of four parts: the main housing, the camera mount, the grain mount, and the light-source mount. Its three-dimensional design (Figure 2) was created using Blender software. The main material used was 8 mm plywood, chosen to minimize the effects of reflection and transmission of visible light on the walls of the device. The camera used was a 50-megapixel model, providing images of 2944 $\times$ 2944 pixels. The data were transmitted to a computer via Wi-Fi using a Python program for greater flexibility.
As shown in Figure 2a, the camera mount rests on bolts fixed to the walls of the main housing. Three camera heights were considered: 15, 20, and 25 cm, chosen to ensure optimal image sharpness. White paper was used as the seed support to homogenize the light received from multiple sources. It was fixed to a wooden frame placed on the assembly brackets. The light-source support was held in place by inserting a flat metal pivot into the slots in the main housing. The detailed dimensions of the image-capture device are shown in Figure 3. It facilitates the handling of the developed experimental device.
The electrical circuit diagram of the image-capture device is shown in Figure 4. It represents a simple lighting circuit with 220 V lamps connected in parallel. This circuit is easily accessible locally, and its maintenance is straightforward. The electronic wiring of the liquid crystal display screen to an Arduino Mega board (Figure 5) was created using Fritzing. These wiring components are also readily available locally and are intended to display detection results on the device's screen.





The calibration of the image-capture device involves determining its operating parameters to optimize the sharpness of the captured images. These parameters include the position of the light source, the lamp power (which affects the light intensity), the height of the camera, the characteristics of the paper supporting the seeds, and the number of lamps. The position of the light source refers to whether it is placed above or below the seeds. This choice was made by calculating and comparing the correlation coefficients between the two main sides of the seeds under each configuration. The correlation coefficient indicates whether information on one side can be perceived independently of the other. A high correlation coefficient suggests that one side transmits information to the camera, even when the light source is below. Lamp power levels were chosen as 5 W (440 lm), 9 W (800 lm), 10 W (900 lm), and 15 W (1500 lm). Camera heights were set at 15 cm, 20 cm, and 25 cm. Eighteen different characteristics of the seed-support papers were considered (Table 1), including four types of A4 paper (80 g/m$^{2}$, 125 g/m$^{2}$ glossy, 150 g/m$^{2}$ glossy, and 180 g/m$^{2}$ curled) and three types of assembly methods (single staples, transparent glue with staples, and white glue with single staples). These variations allow for the analysis of the influence of paper type and the number and type of assemblies on the dependent variables measured.
| Identifier | Description | Identifier | Description |
|---|---|---|---|
| 1 | A single A4 paper of 80 g/m$^{2}$ | 10 | Three stapled A4 glossy papers of 125 g/m$^{2}$ |
| 2 | A single A4 glossy paper of 125 g/m$^{2}$ | 11 | Four stapled A4 papers of 80 g/m$^{2}$ agrafes |
| 3 | A single A4 glossy paper of 150 g/m$^{2}$ | 12 | Two A4 papers of 80 g/m$^{2}$ stapled and bound with transparent glue |
| 4 | A single A4 textured paper of 180 g/m$^{2}$ | 13 | Two A4 glossy papers of 125 g/m$^{2}$ stapled and bound with transparent glue |
| 5 | Two stapled A4 papers of 80 g/m$^{2}$ | 14 | Two A4 glossy papers of 150 g/m$^{2}$ stapled and bound with transparent glue |
| 6 | Two stapled A4 glossy papers of 125 g/m$^{2}$ | 15 | Two textured A4 papers of 180 g/m$^{2}$ stapled and bound with transparent glue |
| 7 | Two stapled A4 glossy papers of 150 g/m$^{2}$ | 16 | Two A4 papers of 80 g/m$^{2}$ stapled and bound with white glue |
| 8 | Two stapled A4 textured papers of 180 g/m$^{2}$ | 17 | Two A4 glossy papers of 125 g/m$^{2}$ stapled and bound with white glue |
| 9 | Three stapled A4 papers of 80 g/m$^{2}$ | 18 | Two A4 glossy papers of 150 g/m$^{2}$ stapled and bound with white glue |
These variables included the average brightness, maximum brightness, contrast, and useful diameter. They were combined through a weighted analysis using Eq. (1). The number of lamps connected ranged from 1 to 4. The choice of the number of lamps was based on the observation that the spacing between lamps reduces the light reaching the seeds. Therefore, it is necessary that the average brightness of the source decreases slightly as the number of lamps increases.
where, $e$ is the weighted average, $E_{\text {avg }}$ is the average brightness, $D$ is the effective diameter, $C$ is the contrast, and $E_{\max }$ is the maximum brightness. All terms on the right-hand side of the equation were normalized to unity using Eq. (2) for each variable $v$.
The dependent variables used for this calibration were brightness, contrast, and diameter at maximum illumination. Brightness must be high to facilitate the transmission of seed information through light. Grain contrast must be high to enable the distinction of class-discriminating patterns. Paper contrast must be low to ensure better homogeneity of the surface occupied by the seeds. The diameter at maximum illumination (gray level above 180) must be large to provide a more effective surface area for transmitting seed information. The dependent variables and corresponding curve plots were calculated using Eqs. (3)–(5) and Python programs executed in the Anaconda Prompt.
Brightness is determined by:
where, GLCM denotes the gray level co-occurrence matrix.
The contrast is given by Rizzi et al. [15]:
where, $P_i$ is the grayscale intensity of pixel i in the gray level co-occurrence matrix, and $C$ is the contrast.
The image size used for determining paper contrast was set to 400 $\times$ 400 pixels. It was divided into 25 blocks of 80 $\times$ 80 pixels each for the calculation in Equation (4). The correlation coefficient is given by Goshtasby et al. [16].
where, $x_i$ is the value of pixel $i$ in the first flattened image; $y_i$ is the value of pixel $i$ in the second flattened image; $\bar{x}$ and $\bar{y}$ are the average pixel values of the first and second images, respectively; $\sigma_x$ and $\sigma_y$ are the standard deviations of the image vectors $x$ and $y$, respectively; $n$ is the number of pixels in an image; and $r$ is the correlation coefficient. All images used were of the same size.
The dataset consisted of images of isolated maize seeds from three different classes (Figure 6): good seeds, discolored seeds, and damaged seeds. Images were obtained from a local seed producer. White maize of the Kasai variety was used. Discolored maize may also include the presence of mold in this study.

To obtain this dataset, images of multiple spaced seeds were captured using the developed device. These images were then segmented after cropping to obtain the grayscale co-occurrence matrix, binarization, erosion, and extraction of contours and coordinates of the bounding box enclosing each seed. The resulting 2,909 images were 192 $\times$ 192 pixels and were automatically labeled in You Only Look Once (YOLO) format using a Python program. The distribution of images by class is provided in Table 2. No data augmentation techniques were applied during training.
Class | Labels | Training Set | Validation Set | Test Set | Total | Percentage (%) |
Good seed | 0 | 649 | 215 | 190 | 1054 | 36.23 |
Discolored seed | 1 | 634 | 205 | 161 | 1000 | 34.37 |
Gnawed seed | 2 | 528 | 171 | 156 | 855 | 29.4 |
Total | 1811 | 591 | 507 | 2909 |
The camera was positioned according to the height determined during the calibration of the imaging device. The temperature inside the device during operation ranged between 28$^\circ$C and 29$^\circ$C. Twenty maize seeds were imaged at various spacings. The resulting images were then processed using a machine learning model to determine the minimum spacing between kernels that could be detected. The developed dataset was used for transfer learning of three YOLO models (in Anaconda Prompt and Google Colab) to facilitate performance evaluation. Training in Anaconda Prompt was performed in central processing unit mode with 8 GB of random access memory, whereas training in Google Colab was performed in graphics processing unit mode with 12.5 GB of random access memory. The training parameters are as follows:
Initial weights: YOLOv5s.pt, YOLOv8s.pt or YOLO11s.pt;
Number of epochs: 200;
Learning rate: 0.01;
Optimizer: Stochastic gradient descent;
Batch size: 16;
Save period: 50 epochs.
The performance of the dataset allows for the assessment of the methodology used in its development. This was evaluated in two ways:
Variability of the collected data: The average of the images captured for the training and validation datasets was calculated. This helped verify whether the dataset exhibited sufficient variability. Such variability is characterized by an average image in which the original object cannot be recognized [17]. In this study, it was necessary to confirm whether the seeds could still be distinguished based on the average of the training and validation images.
Metric performance of the trained models: The performance metrics given by Eqs. (6)–(10), namely, accuracy, precision, recall, F1-score, and class error (mean squared error), were used for evaluation. The confusion matrix was also employed to assess the detection performance of the trained models.
Real-time detection allows for the identification of flaws during the practical use of the developed device. Modifications were made to the device's structure, detection code, and data collection procedure to achieve better real-time performance. The following steps were performed:
Homogenization of illumination for the paper bearing seeds: The use of multiple lamps to increase the illumination area resulted in non-uniform lighting, which reduced the real-time performance of the device. To solve this problem, the device was modified by adding a beam-channeling element to direct the light onto the paper. As a result, the lamps no longer need to be glued to the paper, as the emitted beams reach it directly. A new dataset was collected using the modified device.
Improving data variability: Because the initial dataset did not undergo data augmentation, the position of the seeds (angle, coordinates in the horizontal plane, and the side in contact with the paper) significantly influenced real-time prediction stability. To address this, the same seeds were repositioned to capture multiple images. After capturing an image, the seeds were mixed and spread out again for a new capture. This randomization of seed positions effectively minimizes the impact on real-time predictions.
Improving the accuracy of real-time predictions: The size of the input image and the number of objects to be detected within it influence prediction accuracy. During the initial use of the device, detection was performed on images containing multiple seeds. It was subsequently observed that using images of isolated seeds, with their sizes specified, improved multi-backend predictions. One explanation for this behavior is that the model was trained using images of isolated seed pods. Detection was performed in two stages. First, the positions of multiple seeds in the image were identified. Then, the image was cropped to 192 $\times$ 192 pixels for each seed. Each cropped seed image was processed using the detection model to produce the final predictions. The results of real-time detection were recorded in MP4 files.
3. Results and Discussion
The developed image-capture device is shown in Figure 7. Components (a)–(h) correspond to those presented in Figure 2 during the design phase. This device is compact and can easily blend into rural areas due to the predominance of wood in such settings.

(a) Choice of light source position
Table 3 presents the correlation coefficients between the two main faces of the seeds (Figure 8) when light is reflected and transmitted. This coefficient was higher (0.84) when the light source was positioned below the seeds (light transmission). This indicates that the passage of light through the seed increases the information perceived from the face in contact with the paper. Figure 8c illustrates this observation. At the indentation on Face 1 of the seed, a highlight is observed when viewing Face 2, which transmits light. This light source position involves not only the transmission of light but also its reflection by the seed. When light passes through the paper, it diffuses off the walls of the apparatus before being reflected by the seed. If the translucent parts of the paper that do not support the seed are obstructed, it becomes difficult to observe their physical characteristics.
| Seed ID | Translucent Seed | Non-Translucent Seed |
| Seed 1 | 0.57 | 0.90 |
| Seed 2 | 0.68 | 0.95 |
| Seed 3 | 0.71 | 0.92 |
| Seed 4 | 0.43 | 0.57 |
| Seed 5 | 0.57 | 0.93 |
| Seed 6 | 0.70 | 0.92 |
| Seed 7 | 0.75 | 0.71 |
| Seed 8 | 0.77 | 0.92 |
| Seed 9 | 0.58 | 0.71 |
| Seed 10 | 0.45 | 0.88 |
| Mean (deviation) | 0.62 (0.11) | 0.84 (0.12) |

(b) Choice of lamp power and camera height
Figure 9 and Figure 10 show the average brightness and contrast within the seed, respectively. The 15 W lamp and a camera height of 25 cm provided the best values, as both parameters need to be high. High light intensity indicates that the brightness passing through the seed effectively highlights its translucency. High contrast indicates that texture irregularities within the seed are revealed by the passage of light through it. It was observed that brightness and contrast within the seed increase with lamp power and with the camera-to-seed-support height. The influence of lamp power on these dependent variables is evident. The increase in brightness and contrast with camera-to-seed-support height is caused by light diffusion within the device enclosure. Indeed, the larger the volume between the camera and the support, the more the light is scattered repeatedly off the wooden walls. This diffusion increases the brightness on the observed face of the seed, thereby better balancing the effect of transmitted light on the camera.
(c) Choice of paper characteristics
The average variable for all paper characteristics is shown in Figure 11. Papers with low average values were inefficient in transmitting light effectively during image capture.
According to Figure 11, the weighted average is highest (0.83) for Characteristic 6, followed by Characteristics 3 and 17. A support with these characteristics is shown in Figure 12. Their high weighted average was attributed to the combination of light intensity and contrast in the high-brightness areas.
Characteristic 6 represents two 125 g/m$^{2}$ sheets of paper simply stapled together and exhibits the highest decision variable. However, the presence of a blue tint introduces the risk of light absorption and scattering within the paper. The gap between the two stapled sheets was not conducive to optimal light transmission. A close-up observation of the most illuminated region of the paper with Characteristic 6 reveals the presence of additional spectral components beyond uniform white light (Figure 13). This indicates increased light decomposition compared to Characteristic 3, which may alter the information transmitted through the translucent seed.





Characteristic 3 represents 150 g/m$^{2}$ glossy paper. It has the second-highest value for the decision variable due to its smooth (thinner) texture and compact structure. Therefore, this study was conducted using A4-sized glossy paper weighing 150 g/m$^{2}$.
(d) Choice of the number of lamps
Each lamp number was associated with a specific configuration (A, B, C, and D), as shown in Figure 14.

Table 4 shows the variations in brightness according to the number of lamps added. Three lamps appear to be suitable to remain within a 2% variation margin. It is observed that brightness decreases as the number of lamps increases. This has two causes. First, the spaces between the lamps receive less light intensity compared to the area directly above each lamp. This difference in intensity in the inter-lamp spaces contributes to reducing the overall brightness of the illuminated surface. Second, a higher number of lamps also results in more light being diffused in the space between the support and the camera. This diffused light balances the transmitted light by reducing its intensity. As observed in the interpretation of Figure 9 and Figure 10, this diffused light also contributes to the clarity of the information recorded by the camera.
Lamp Number | Average Brightness | Brightness Reduction |
1 | 227.33 | / |
2 | 223.84 | 1.5% |
3 | 222.35 | 2.2% |
4 | 217.14 | 4.5% |
Therefore, after calibration, the light source was positioned below the seed, a 15 W lamp was used, the camera height was set to 25 cm, A4 glossy paper with a density of 125 g/m$^{2}$ was selected, and three lamps were employed.
Figure 15 shows an example of images of multiple high-quality seeds captured before segmentation. These images allow observation of the good translucency of the seeds, giving them a glassy appearance. The seeds are spaced within the image to facilitate automatic segmentation. The orientation of the seeds is not uniform, which contributes to the spatial variability of the objects in the image. Some maize seeds in the last two rows of Figure 15 are less translucent and appear darker. Due to this low surface illumination, they could be mistaken for discolored seeds.

Figure 16 shows several sample images from the dataset with light transmission. This demonstrates the accurate automatic labeling of individual seed images. Discolored seeds labeled as 1 appear yellow when they retain good translucency. Some resemble high-quality seeds due to the presence of discoloration on naturally opaque areas (base and tip of the seed). Damaged seeds labeled as 2 may exhibit indentations characterized by very high light transmission compared to the rest of their surface. Some high-quality seeds labeled as 0 may appear discolored due to low translucency, producing an optical effect that alters the color perceived by the camera. The brightness of the illuminated support at the location of these seeds also influences this optical effect.

Table 5 presents the accuracy, mean average precision, and F1-score for the training of the YOLOv5s, YOLOv8s, and YOLO11s models. Overall, the metrics are better for the YOLOv5 model. Higher accuracy indicates effective classification, while a high mean average precision reflects the model's robustness in localizing the correct objects. The F1-score represents the balance between precision and recall. Thus, the YOLOv5s model is better suited for this seed detection task. Indeed, the YOLOv5 model incorporates anchor boxes and Cross Stage Partial (CSP) modules. These features allow for better recognition of object shapes in the image and improved extraction of subtle textural details, respectively. YOLOv5 performs well on precise datasets, such as the maize seed dataset used in this study. YOLOv8 and YOLO11 are more efficient in contexts requiring versatility or faster execution. Moreover, YOLOv8 may be more suitable outside the experimental context due to its higher F1-score, which reflects its versatility for field detection. The metrics for the dataset with light transmission are higher than those for the dataset without light transmission. This corroborates the conclusion obtained from the comparison of correlation coefficients during the calibration of the imaging device. The additional information available in the light-transmission dataset provides more details, enabling easier and more robust training of the YOLO model.
Translucency Dataset | Opaque Dataset | |||||
Model | Maximum accuracy (%) | Maximum mean average precision (%) | F1-score (%) | Maximum accuracy (%) | Maximum mean average precision (%) | F1-score (%) |
YOLOv5 | 93.07 | 97.76 | 79.28 | 81.5 | 80.32 | 75.44 |
YOLOv8 | 92.12 | 91.72 | 85 | 81.13 | 81.04 | 73.24 |
YOLO11 | 92.56 | 92.38 | 82.72 | / | / | / |
(a) Variability of collected data
The variability of the collected data was evaluated on two main aspects: the physical appearance of the seeds and their position on the paper. The average image of multiple seed images is shown in Figure 17. It was observed that, regarding the physical appearance of the seeds, it is no longer recognizable in the average image. Indeed, it is impossible to identify the object type in the average image because the physical characteristics specific to maize seeds are heavily blurred. This occurs because pixels at the same position in different images do not convey the same seed texture information. Averaging these pixels disrupts the overall texture of the maize seeds' physical appearance, indicating an acceptable variability in seed appearance. Regarding the position of the seeds on the paper, it is distinguishable at the top of the image. This indicates that certain seed positions in the upper part of the image occur more frequently than others, which is disadvantageous for variability in seed position in the dataset. In the lower part of the average image, positions are not clearly identifiable, showing a better spatial distribution of seeds in this area of the paper. The poor distribution of seeds at the top of the paper can be explained by the narrowing of the illuminated surface, which leaves less room for seed placement. Therefore, variability in seed position is acceptable in the lower part of the image but poor in the upper part.

(b) Metric performance of the trained YOLOv5s model
The YOLOv5 model trained with the light-transmission dataset was considered superior to the others (Table 5). Its performance metrics are presented in Table 6. Lower precision compared to recall indicates that, overall, the model predicts more positive values per class than actually exist. For the light-transmission dataset, false positive predictions mainly occur in the background. The error across classes is below 2\%, demonstrating that the classification achieves a high prediction accuracy.
Metric | Value |
Accuracy | 93.07% |
Precision | 77.31% |
Recall | 81.36% |
Class error | 0.0114 |
Nagar et al. [11] reported an accuracy of 79.24% for classifying maize seeds into four classes using the ResNet18 model over 70 epochs. This result is lower than that obtained in the present study, indicating that the YOLOv5 model is better suited than ResNet18 for this classification task. The specific purity reported by Nagar et al. [11] was 88.25%, which is lower than the 95.9% achieved in this study. This demonstrates that the use of seed translucency improves discrimination between high-quality and defective seeds. The seed-specific purity standard in Cameroon is 98%, slightly higher than that obtained in this study.
Class-wise accuracies after training were 95.9%, 83.7%, and 87.8% for high-quality, discolored, and damaged seeds, respectively. These results indicate that discolored seeds are the most difficult to classify. This can be explained by their low translucency, which affects the digital data obtained. When translucency is low, the seed image may appear darker, masking certain discriminative details. Indeed, discolored seeds, often moldy, lose their glassy appearance due to the degradation of storage proteins and structural modification of starch. Their new, floury appearance is more opaque to visible light. Discolored seeds may be confused with high-quality or damaged seeds with low translucency, as the camera perceives a darker surface in both cases. This highlights a potential study on calibrating two light sources positioned above and below the seeds, as previously mentioned.
The real-time detection interface is shown in Figure 18. It displays the total number of seeds, the number of seeds per class, and the coordinates of the seeds according to their classes. The coordinates are displayed to demonstrate their availability in case a sorting machine uses this detection model. For most seeds, high translucency allows the class-specific characteristics to be easily identified. Seeds that appear very dark due to poor translucency are assigned to the discolored class. Confidence levels exceed 90\% when there is no identification fluctuation.

Fluctuations in predictions have been observed to depend on seed translucency, as the detection model relies on this property. Therefore, further calibration was performed using two light sources. The first source, placed above the seeds, allows the observation of translucency defects, while the second source, placed below the seeds, enhances internal features and reveals the hidden side of the seeds. Proper calibration of the light intensities for these two sources significantly reduces inference fluctuations. The proposed detection protocol was most effective for maize seeds with high translucency. Since seed storage reduces translucency, it is important to characterize and sort seeds immediately after harvesting and drying.
4. Conclusion
The aim of this study was to use the translucency of white maize kernels to assess their physical quality. To this end, an image-capture device was developed, consisting of three 15 W lamps placed beneath a 150 g/m$^{2}$ glossy paper and a 50-megapixel camera positioned 25 cm above the kernel support. The correlation between the images of the two main faces of the kernels showed, on average, that more than 22% of the information could be captured using translucency compared to without it. The dataset comprised 2,929 images, each 192 $\times$ 192 pixels in size. Calculating the average of the images revealed significant variability in the appearance of the kernels and moderate variability in their position within the image. Among the training sessions conducted using the YOLOv5, YOLOv8, and YOLOv11 models, the YOLOv5 model achieved the highest accuracy. The dataset incorporating seed translucency yielded a maximum accuracy of 93.07%, which was higher than the 81.5% accuracy obtained using the dataset without seed translucency. These results demonstrate a clear advantage in using the translucency of white maize seeds for detecting their physical quality with computer vision.
Conceptualization, C.N.K. and A.R.T.; methodology, A.R.T. and G.H.D.; software, A.R.T. and C.N.K.; validation, J.K.T. and A.R.T.; formal analysis, A.R.T., G.H.D., C.N.K., and J.K.T.; investigation, A.R.T., G.H.D., C.N.K., and J.K.T.; resources, A.R.T., G.H.D., C.N.K., and J.K.T.; data curation, A.R.T., G.H.D., C.N.K., and J.K.T.; writing—original draft preparation, C.N.K. and A.R.T.; writing—review and editing, C.N.K. and A.R.T; visualization, J.K.T.; supervision, J.K.T. All authors have read and agreed to the published version of the manuscript.
The data used to support the research findings are available from the corresponding author upon request.
The authors declare no conflict of interest.
