Javascript is required
Search
Volume 5, Issue 2, 2026

Abstract

Full Text|PDF|XML
Automated grading has become an important component of digital transformation in K-12 education, yet the structured recognition of handwritten responses on answer sheets remains a practical challenge. General-purpose vision-language models often show limited robustness when applied directly to school assessment materials, particularly in the presence of fixed answer regions, mixed Chinese-English content, and diverse handwriting styles. To address this issue, this study develops a task-oriented fine-tuning framework for automated recognition of handwritten answer sheets in K-12 educational settings. A multimodal dataset was constructed from Chinese and English answer sheets, with region-level annotations designed to support structured text extraction. Based on this dataset, the Qwen2.5-VL-7B-Instruct model was adapted through LoRA-based fine-tuning under a dual-A16 GPU environment to reduce computational cost while preserving practical deployment feasibility. An end-to-end workflow covering data preparation, model training, weight merging, and inference was then established for structured JSON output. Experimental results show that the fine-tuned model achieved stable convergence in both small-sample and medium-sample settings and improved the extraction quality of handwritten responses within predefined answer regions. The proposed framework provides a practical and reproducible solution for deploying vision-language models in school grading scenarios with limited computing resources. The study also offers an application-oriented reference for the integration of multimodal large models into educational assessment systems.

Abstract

Full Text|PDF|XML
Small object detection in aerial imagery remains challenging due to limited spatial resolution, background clutter, and severe scale variation. Existing deep learning–based detectors often suffer from weakened shallow representations and insufficient cross-scale feature interaction, leading to missed detections and unstable localization in dense scenes. This work presents Dynamic Reconstruction and Fusion Network (DRF-Net), a frequency-guided feature reconstruction framework for small object detection. Built upon a one-stage detection paradigm, the proposed method introduces three key components: a frequency-guided channel–spatial augmentation (FCSA) module to enhance fine-grained representations, a multi-frequency reconstruction block (MFRB) to restore cross-scale structural information, and a Dynamic Reconstruction Fusion Neck (DRF-Neck) to adaptively regulate multi-scale feature aggregation. By jointly modeling high- and low-frequency components and integrating saliency-aware fusion mechanisms, the framework improves the preservation of small-object contours while suppressing redundant background responses. Extensive experiments conducted on the VisDrone2019 benchmark demonstrate that DRF-Net consistently outperforms the baseline detector in terms of detection accuracy, particularly for small and densely distributed objects, while maintaining real-time inference efficiency. Ablation studies further verify the complementary contributions of the proposed modules to feature representation and fusion stability. The results indicate that frequency-guided reconstruction and dynamic fusion provide an effective learning strategy for enhancing small-object detection performance in complex visual scenes.
Open Access
Research article
A Baseline Optical Character Recognition Framework for Printed Kashmiri Nastaliq Text Using Deep Learning
sheikh amir fayaz ,
muzamil majeed khaja ,
abdul saboor bhat ,
danish mansoor ,
anu thapa ,
majid zaman
|
Available online: 04-24-2026

Abstract

Full Text|PDF|XML

Optical Character Recognition (OCR) plays a crucial role in the digitization and preservation of textual information; however, for low-resource languages such as Kashmiri, reliable OCR solutions remain largely unavailable. Kashmiri, primarily written in the Perso-Arabic (Nastaliq) script, poses significant challenges due to its cursive structure, extensive use of ligatures, complex diacritical marks, and limited availability of annotated datasets. This research aims to address these challenges by developing a functional OCR system specifically tailored for Kashmiri text. The proposed system is built using the open-source Kraken OCR engine and leverages deep learning techniques with transfer learning from a pre-trained Arabic OCR model. A synthetic dataset was generated using Unicode Kashmiri text, enriched with Kashmiri-specific diacritics and exclusive characters, and rendered into images through automated text-to-image pipelines. Extensive preprocessing, augmentation, and iterative fine-tuning were performed to improve recognition accuracy. Model performance was evaluated using standard metrics such as Character Error Rate (CER) and Word Error Rate (WER) on both seen and unseen data. Experimental results demonstrate a substantial improvement over the initial model, with character accuracy increasing from 54.91% to 79.91% and word accuracy improving from 4.65% to 44.19%. The final model shows strong recognition capability for common and Arabic script characters, while Kashmiri-specific inherited diacritics remain a challenging area. In addition, a cross-platform user interface developed using Flutter enables users to upload or capture images and obtain digitized Kashmiri text through a simple and accessible workflow. Rather than proposing a new recognition architecture, this work contributes empirical insights, reproducible methodology, and error characterization for OCR in a previously unsupported low-resource Nastaliq language. This work is positioned as a baseline OCR system for printed Kashmiri Nastaliq text at the line level and does not claim state-of-the-art performance.

Open Access
Research article
A Low-Cost YOLOv5-Based System for Automated Classification of Maize Seed Translucency
andré rodrigue tchamda ,
grisseur henri djoukeng ,
cabrel nankap kapnang ,
julius kewir tangka
|
Available online: 05-15-2026

Abstract

Full Text|PDF|XML

The physical quality of seeds is a critical determinant of sorting efficiency and crop productivity, yet conventional assessment approaches are often labor-intensive, invasive, and time-consuming. To address these limitations, computer vision-based methods have been increasingly adopted; however, most existing techniques rely primarily on reflected visible light, thereby capturing only surface-level features and limiting the detection of internal defects. In this study, a low-cost imaging system integrating both reflection and transmission of visible light was developed to enhance the characterization of maize seed translucency. By enabling simultaneous acquisition of information from the two principal faces of white maize seeds, a more comprehensive representation of both external morphology and internal structural variations was achieved. A comparative analysis was conducted between the conventional reflection-based method and the proposed imaging approach, with correlation coefficients between seed faces determined as 0.62 and 0.84, respectively, indicating a substantial improvement in feature consistency and information richness. A dedicated dataset was subsequently constructed using both imaging techniques and employed to train a YOLOv5s-based detection model over 200 epochs. The classification performance demonstrated a marked enhancement, with the proposed method achieving an accuracy of 93.07%, compared to 81.5% obtained using the conventional approach. Furthermore, real-time detection capability was validated through the implementation of the optimized imaging system, in which improved inference stability and robustness were achieved under practical operating conditions. The results indicate that the integration of transmission with reflection imaging provides a cost-effective and reliable solution for non-destructive seed quality assessment, offering significant potential for scalable deployment in agricultural sorting systems.

Abstract

Full Text|PDF|XML

In mobility-aware scenarios such as vehicular networks, mobile augmented reality (AR)/virtual reality (VR) services, and other latency-sensitive Multi-access Edge Computing (MEC) applications, continuous user movement leads to frequent migrations of service function chains (SFCs). Traditional approaches typically rely on global deployment comparisons, which fail to accurately identify the specific virtual network functions (VNFs) that require migration and their optimal target nodes. This limitation often results in redundant migrations, inefficient resource utilization, and an increased risk of service disruption, thus hindering the balance between latency assurance and resource efficiency. To overcome these limitations, this paper proposed a graph-enhanced deep reinforcement learning–based adaptive migration optimization (DRL-GAMO) framework. By integrating the topological representation capability of graph neural networks (GNNs) with the decision-making efficiency of deep reinforcement learning (DRL), DRL-GAMO established a topology–resource–decision mapping that jointly optimized VNF selection and determination of target nodes. This pre-migration decision process effectively reduced redundant operations and directed migration behaviors toward resource-efficient strategies. The designed reward function minimized migration overhead under service-level agreement (SLA) latency constraints and penalized downtime to maintain service continuity. Simulation results demonstrated that DRL-GAMO achieved stable service latency, lower resource consumption, and shorter migration time while reducing migration volume by more than 40% compared with DRL-ADMO, thereby improving the migration success rate and validating its effectiveness in MEC environments.

- no more data -