Optimizing Artificial Neural Networks Modelling in Predicting International Roughness Index for Flexible Pavement
Abstract:
Accurate road roughness prediction is essential for sustainable transportation planning and cost-effective maintenance strategies. This study develops a systematic algorithm to optimize Artificial Neural Networks (ANN) for predicting International Roughness Index (IRI) values using Equivalent Standard Axle (ESA) and road age as primary inputs. The methodology employs comprehensive parameter space exploration across four optimization stages, evaluating various ANN configurations to identify the most effective architecture. Rigorous statistical validation through Analysis of Variance (ANOVA) and cross-validation ensures model reliability. Data quality assessment with outlier detection using the Interquartile Range method was implemented, retaining 94.3% of original observations. The optimized 6-30-25-20-1 ANN configuration, employing logsig and purelin transfer functions, achieved strong performance metrics, including $R$ = 0.9554, $R^2$ = 0.9020, MSE = 0.0153, RMSE = 0.1236, and MAPE = 0.0285. Statistical validation confirmed significant model improvements with an F-statistic of 24.367 and a cross-validation mean of 0.892. The RMSE accuracy of 0.1236 m/km enables reliable pavement condition classification within established IRI thresholds, supporting timely maintenance decisions. This streamlined approach addresses critical infrastructure management challenges by enabling cost-effective maintenance planning with minimal data requirements, particularly valuable for developing countries with limited pavement monitoring infrastructure. The model’s computational efficiency facilitates network-wide deployment for long-term planning and strategic resource allocation. Road agencies can apply this model for maintenance budget prioritization, network-level condition assessment, and multi-year intervention scheduling, particularly in resource-constrained environments where comprehensive pavement monitoring systems are unavailable. This study establishes a structured approach to optimize ANN for IRI prediction, enhance the effectiveness of Pavement Management Systems (PMS), and support sustainable transportation infrastructure through improved maintenance scheduling.
1. Introduction
Road pavements deteriorate in condition, bearing capacity, and serviceability with age, traffic volume, and the influence of several elements such as pavement type and environmental conditions [1], [2], [3], [4]. Structural and functional deterioration leads to reduced pavement serviceability [5], [6]. The smoothness of the road surface impacts driving safety, performance, and comfort [7], [8], [9]. In modern technology, infrastructure maintenance is one of the most important aspects of maintaining highway performance and safety [10]. A key maintenance component is determining the optimal maintenance time, which requires precise forecasting of road conditions [11]. International Roughness Index (IRI) measurements have become a widely used criterion for assessing road surface quality in this context, as they can help control and manage roadway conditions [12], [13]. By identifying the optimal time to perform maintenance, IRI forecasting can help highway management reduce losses caused by poor road quality [14].
Road repair is required when pavement condition becomes rough enough to compromise the safety and comfort of users [15], [16]. The condition of the pavement has a significant impact on the accident rate. The condition of the pavement significantly impacts the accident rate. Therefore, pavement condition should be considered when planning maintenance, rehabilitation, and reconstruction [17]. Road surface roughness is quantified by IRI [11], [18], [19], [20]. Road authorities use Intelligent Transportation Infrastructure (ITI), a system developed by the World Bank and introduced by the National Cooperative Highway Research Program (NCHRP) in the 1980s [21]. IRI values indicate pavement unevenness and support more efficient maintenance planning [22], [23]. Accurate road roughness prediction is necessary for sustainable transportation, enabling planners to develop cost-effective maintenance and rehabilitation strategies [22]. Pavement surface roughness is influenced by many factors, including traffic volume, weather, pavement composition and structural design, construction quality, and continuous maintenance and rehabilitation activities [24]. Pavement performance models, also called deterioration or evolution models, should be integrated into Pavement Management Systems (PMS). Based on thorough data analysis, these models are used to forecast future pavement conditions [25]. Road Maintenance and Rehabilitation (M&R) can use limited resources most effectively by strategically planning and coordinating treatment activities based on pavement performance forecasts for the upcoming years [11].
Many prediction models are now used, ranging from conventional linear and non-linear regression models to advanced machine learning techniques such as Artificial Neural Networks (ANN) and Gene Expression Programming [26]. Notably, ANN has gained traction as a modelling technique in various pavement applications. Unfortunately, few studies have described how to build the algorithm and which functions are included, so that it can produce the best model for predicting IRI. For instance, researchers have used ANN to develop models for IRI prediction in rigid pavements [25], and for predicting subgrade elastic modulus [27], [28], [29]. Additionally, ANN has been applied in numerous studies for diverse purposes, including asphalt dynamic modulus prediction [30], [31], correlating pavement roughness with structural performance [32], and selecting pavement maintenance strategies while recalculating pavement layer properties [31], [33], [34], [35].
ANN-based IRI prediction models have gained popularity in recent years. However, a notable research gap exists in systematic approaches to parameter optimization. Most existing studies lack structured methodologies for parameter tuning, which is necessary for developing accurate predictive models [36], [37]. Finding optimal ANN parameters remains challenging, as no standardized approach to parameter selection has been established [38]. This gap is particularly evident in road maintenance prediction, where precise modeling can yield considerable cost savings and improved infrastructure management. A systematic trial-and-error method for ANN parameter optimization is therefore needed. Previous studies have demonstrated the effectiveness of trial-and-error approaches [39]. To address these limitations and contribute to the field, this study develops an ANN model to forecast IRI values using traffic load such as Equivalent Single Axle Load (ESA) and road age through a systematic optimization algorithm. This algorithm can be used in future research to test and evaluate IRI prediction models for both flexible and rigid pavements on toll roads.
Recent advances in neural network optimization have introduced sophisticated techniques, including Transformer-based architectures and Bayesian optimization methods [40]. Studies by Zhang et al. [41] demonstrated that hybrid optimization approaches can improve convergence speed by up to 40% compared to traditional methods. However, these advanced techniques often require substantial computational resources and specialized expertise, which may limit their applicability in practical pavement management scenarios, particularly in developing countries [42], [43].
This study addresses the identified research gaps by making the following key contributions to pavement management practice:
$\bullet$ A systematic parameter optimization framework: A four-stage trial-and-error methodology for ANN architecture selection that provides comprehensive parameter space exploration, offering a replicable approach where standardized methods are currently lacking in IRI prediction modeling.
$\bullet$ Minimal-data prediction capability: Development of an IRI prediction model using only ESA and road age as inputs, eliminating the need for extensive historical pavement condition data, and enabling deployment in regions with limited monitoring infrastructure.
$\bullet$ Practical validation for toll road networks: Rigorous statistical validation through Analysis of Variance (ANOVA) and cross-validation, demonstrating model applicability for flexible pavements on toll roads, with computational efficiency suitable for network-level implementation in resource-constrained settings.
These contributions provide road agencies with an accessible yet accurate tool for maintenance prioritization and budget allocation, particularly valuable for developing countries seeking cost-effective pavement management solutions.
2. Material and Methods
The ANN debuted in the early 1950s, originating from psychologist Donald Hebb, who delved into the neural mechanisms of learning in the brain, culminating in the formulation of Hebb’s Law [44]. As a subset of machine learning techniques, ANN derives inspiration from neurobiological principles, thereby emulating cerebral functionality [45], [46]. Rosenblatt further elucidated this concept by introducing the perceptron training algorithm, marking the inception of a mathematically viable model amenable to computational simulation [47]. The advent of the backpropagation training algorithm in 1980 gave engineers a compelling impetus to explore ANN as a rapid and simple resource for mathematical modelling [48], [49]. Demonstrating adeptness in handling complex datasets exhibiting non-linear tendencies and lacking adherence to conventional mathematical frameworks, ANN furnishes a notably accurate solution for empirical modelling endeavours [50]. Reflecting neurological processes, ANN shows an architectural assembly of the human brain marked by a significant degree of parallelism [51].
ANN excel in handling applications marked by complex multi-parameter interactions [52], [53]. ANNs have demonstrated their efficacy in approximating complex nonlinear functions using input-output data [54]. The approximation prowess inherent in this soft computing paradigm is the primary impetus, as delineated by Eq. (1) herein [4]. Eq. (1) is met for any vector-valued continuous function $\mathrm{g}(\mathrm{x})$ defined on a subset $\mathrm{A} \subset \mathrm{Rn}$ where $\mathrm{x} \in \mathrm{A}$ and any $\in>0$. There is a function $f(x)$ associated with $x$.
The Equation is a clear testament to the prowess of ANN in approximating non-linear functions. For example, neural network-based modelling provides a natural way to forecast IRI values for rigid pavements by leveraging unpredictable climate and traffic information commonly provided by regional state agencies such as the Department of Transportation (DOT) [4]. Generally, an ANN comprises an input layer, an output layer, and a series of hidden layers [55], where intricate non-linear computations occur, as depicted in Figure 1.

Three basic components make up an ANN: (1) input neurons, also referred to as processing elements, which process inputs like traffic and climate data along with other parameters; (2) connection weights, which create connections between inputs and outputs; and (3) output neurons, which represent IRI values. ANN is based on a computational intelligence paradigm that emulates the functioning of biological nervous systems.
ANNs comprise a variable number of layers, depending on data complexity and network architecture. Multi-layer configurations may include more than one intermediary layer, commonly called hidden layers, which enhance training by adjusting the connecting weights to yield optimal models [57]. Each layer comprises a cluster of neurons interconnected via synapses. These synapses, representing connections, initially have weights that are modified during the iterative network operation. The standard approach for most neural networks begins with the training phase, followed by cross-validation and testing, during which predicted outputs are compared with actual outputs. Given ANN’s ability to handle data devoid of conventional mathematical relationships, the resulting solution is often considered a black box [58].
The efficacy of the entire mechanism primarily hinges on the arrangement of neuron connections, the methodology employed for determining connection weights (termed the learning algorithm), and the neuron activation function (also known as the transfer function) [59]. Discrepancies in architectural comprehension arise across various studies, yielding divergent outcomes. Broadly, several ANN parameters, including the number of hidden nodes and layers, necessitate meticulous adjustment during training, as these configurations significantly impact model accuracy. Typically, trial-and-error methodologies are employed to ascertain optimal parameter settings [60]. Consequently, the trial-and-error approach will be employed in this investigation to discern the most effective model.
The selection of ESA and road age as primary input parameters is strategically based on several practical and theoretical considerations. Traffic loading is universally recognized as the dominant factor influencing pavement deterioration, with numerous studies demonstrating its critical role in structural and functional degradation [2], [61]. Researches by Elhadidy et al. [62] and Santos et al. [63] in genetic algorithm-based pavement optimization confirmed that traffic parameters consistently emerge as the most significant predictors in pavement performance models. Road age represents the cumulative effects of time-dependent deterioration processes, including environmental factors, material aging, and repeated loading cycles [6], [64].
While comprehensive pavement condition assessments incorporating multiple distress indicators (cracking, rutting, surface defects) can enhance prediction accuracy [7], [65], such detailed data collection requires specialized equipment and substantial resources. Studies in metropolitan areas have demonstrated that simplified models using primary indicators can achieve acceptable accuracy for maintenance planning while remaining cost-effective [18], [66]. This streamlined approach is particularly advantageous for developing countries, where extensive pavement condition-monitoring infrastructure may be limited [11].
The decision to focus on these two parameters aligns with data availability constraints in Indonesian toll road systems, where traffic monitoring is systematically conducted but detailed distress surveys are performed less frequently. This approach ensures model applicability across the national toll road network while maintaining practical implementation feasibility for routine maintenance planning.
Environmental factors such as temperature variation, rainfall intensity, and freeze-thaw cycles undoubtedly influence pavement deterioration rates. However, incorporating these parameters would require: (1) extensive historical climate data collection at multiple locations, (2) complex data preprocessing and synchronization procedures, and (3) significantly increased model complexity. While acknowledging this as a limitation, the current model prioritizes practical implementability and data availability.
There are two types of input nodes, namely ESA and road age, with the following conditions.
i. Traffic load (ESA).
The data collected are traffic volumes per class per year, based on the classification of vehicles as defined in the Republic of Indonesia’s Decree of the Minister of Public Works, number 370/KPTS/M/2007, dated August 31, 2007 [67], as shown in Table 1 below. Furthermore, the traffic volume is converted into traffic load to standard load or ESA by using the Equivalent Load Factor or Vehicle Damage Factor (VDF), with the following Eq. (2):
where,
$\mathrm{ESA}_{\mathrm{I}-\mathrm{V}}$ = traffic load by class per year;
$\mathrm{VL}_{\mathrm{I}-\mathrm{V}}$ = traffic volume by class per year;
VDF = Equivalent Load Factor (Vehicle Damage Factor) for each vehicle type, as shown in Table 2;
DD = direction distribution factor, for two-way roads is generally taken as 0.50 and
DL = the lane distribution factor, as in Table 3.
Two types of loads that must be considered in the construction planning of pavement structures are normal load and actual load in pavement design [67]; the following is an explanation of the two types of loads:
$\bullet$ Normal load, also called controlled load, is calculated based on the government’s standard vehicle load. This normal load is used as a reference when planning the construction of pavement structures.
$\bullet$ Actual Load: also known as the load received by the pavement from vehicles travelling on it. This actual load can vary depending on the vehicle’s weight, the number of wheels, and road conditions. For this study, we used the actual loads for VDF 4 and VDF 5.
Actual loads are used because it is assumed they will remain until 2020, after which the overload is controlled with a nominal axis load of 12 tons [67].
| Classes | Type of Vehicle |
| Class I | Sedans, Jeeps, Pick Ups/Small Trucks and Buses |
| Class II | Trucks with 2 (two) axles |
| Class III | Trucks with 3 (three) axles |
| Class IV | Trucks with 4 (four) axles |
| Class V | Trucks with 5 (five) or more axles |
Classes | Normal Loads | Actual Loads | ||
VDF 5 | VDF 4 | VDF 5 | VDF 4 | |
Class I | 1.0 | 1.0 | 1.0 | 1.0 |
Class II | 5.1 | 4.0 | 9.2 | 5.3 |
Class III | 6.4 | 4.7 | 14.4 | 8.2 |
Class IV | 9.7 | 7.4 | 19.8 | 11.0 |
Class V | 10.2 | 7.6 | 33.0 | 17.7 |
Number of Lanes in Each Direction | Commercial Vehicles in the Design Lane (% of Commercial Vehicle Population) |
1 | 100 |
2 | 80 |
3 | 60 |
4 | 50 |
ii. Road age
The age of the road in question is the road's age from the time it opened until 2024, and the age of the road determines how much traffic each class receives.
Only one target node becomes the output of this model in the form of IRI value data per year since the toll road operated, or starting from 2007–2024, for each toll road according to the type of pavement.
The proposed trial-and-error algorithm employed a systematic approach to configure the ANN model via comprehensive parameter-space exploration. While modern optimization techniques such as genetic algorithms (GA) and Bayesian optimization have demonstrated effectiveness in neural network optimization [68], [69], the trial-and-error approach was selected based on specific practical and computational considerations.
Recent studies, e.g. Basnet et al. [70] and Lu et al. [17], comparing optimization methods found that while GA-based approaches achieved marginally better optimal solutions (approximately 3–8% improvement in $R^2$ values), they required 15–20 times longer computational time and significantly more complex implementation. The trial-and-error method provides several advantages for this application: (1) computational efficiency with execution times under 2 hours compared to 12–16 hours for GA-based optimization, (2) an interpretable optimization process allowing practitioners to understand parameter interactions, (3) reproducible results across different computing environments, and (4) minimal software dependencies suitable for practical implementation.
Comparative studies in neural network optimization have shown that systematic trial-and-error approaches can achieve 85–92% of the optimization quality obtained by advanced metaheuristic algorithms while requiring significantly fewer computational resources [19], [71]. In pavement management applications where deploying a model across many agencies is important, the systematic trial-and-error method is better because it strikes a good balance between optimization quality and practical feasibility.
The optimization process incorporated 720 different parameter combinations across four stages, ensuring comprehensive exploration of the parameter space. This systematic approach, while computationally less sophisticated than GA or Bayesian methods, provides a robust and practical solution for developing countries where computational resources and expertise may be limited.
A thorough evaluation framework was developed using several statistical measures to ensure robust assessment of model performance. Mean Squared Error (MSE) for overall error magnitude evaluation, Root Mean Square Error (RMSE) for scale-dependent accuracy assessment, and Mean Absolute Percentage Error (MAPE) for scale-independent performance measurement were among the primary metrics. The strength and quality of the prediction model were also evaluated using Pearson’s correlation coefficient ($R$) and Coefficient of determination ($R^2$). This multi-metric strategy enabled an impartial comparison across several parameters throughout all optimization stages.
The data partitioning implementation used stratified random sampling to ensure a representative distribution of the dataset. Systematic division of the data into three segments, a training set consisting of 70% of the data for model learning and parameter adjustment, a validation set with 15% for monitoring training progress and preventing overfitting, and a testing set with the remaining 15% for final model evaluation, produced Specific data traits and modelling needs were used to guide the choice of activation functions. Hidden layers used the Logarithmic Sigmoid (logsig) function to map inputs to a bounded 0–1 range, allowing efficient non-linear transformations. The output layer used the Pure Linear (purelin) function to enable constant IRI forecasts. With its -1 to 1 output range, the Hyperbolic Tangent Sigmoid (tansig) function provided greater nonlinear transformation capabilities, particularly for capturing complex patterns in the data.
The experimental implementation was carried out in a standardized computing environment to guarantee reproducibility and consistent performance. The hardware setup ran Windows 10 Professional (64-bit), had 16 GB of DDR4 RAM, and an Intel Core i7-10700K processor. For algorithmic execution, the software implementation used Neural Network Toolbox (Version 11.1) with MATLAB R2021a. Five sequential stages comprise the optimization process: data pre-processing for quality assurance, parameter grid generation to define the search space, iterative model training across parameter combinations, thorough performance evaluation, and optimal configuration selection. Although computationally intensive, this approach was practical and efficient for optimizing ANN.
The MATLAB implementation ensured methodological reproducibility through a rigorous protocol. Data preparation included quality assessment, normalization using the mapminmax function, and appropriate matrix structuring. Network configuration established core parameters with random-seed initialization for reproducibility, systematic data partitioning, and architecture specification following the optimization framework.
Training parameters utilized the Levenberg-Marquardt algorithm with epoch limits, performance targets, and validation criteria. The execution protocol maintained systematic documentation throughout model training, with continuous validation monitoring, dynamic parameter adjustment based on performance metrics, and final evaluation on an independent test set. This implementation framework ensures methodological transparency and enables replication by other researchers.
All statistical analyses employed a significance level of $\alpha$ = 0.05 unless otherwise specified. The validation framework incorporated multiple sample sizes: ANOVA analysis used $n$ = 30 distinct model configurations, paired t-tests evaluated $n$ = 720 parameter combinations across optimization stages, and k-fold cross-validation utilized the complete dataset ($n$ = 850 observations) partitioned into 10 folds of approximately $n$ = 85 observations each.
A methodical approach, including several statistical tests, was followed to guarantee thorough statistical validation of the model’s performance. Beginning with ANOVA, the study used one-way ANOVA to compare the performance of several ANN configurations and factorial ANOVA to assess interactions among important parameters, including the number of hidden-layer neurons, activation function combinations, and training algorithms. Tukey’s HSD test for pairwise comparisons between configurations, Bonferroni correction to manage the family-wise error rate, and effect size analysis with Cohen’s $d$ to measure the size of differences were used in post-hoc studies. Performance enhancement validation included paired $t$-tests to evaluate the model’s performance before and after optimization, 95% confidence intervals to estimate accuracy gains, and the Wilcoxon signed-rank test as a robust nonparametric alternative. Cross-validation analysis was performed using 10-fold cross-validation to evaluate model stability, repeated measures ANOVA to assess consistency across folds, and standard error calculations to estimate prediction variability. Finally, residual analysis was conducted to assess model robustness, including the Shapiro-Wilk test for normality, the Durbin-Watson test for autocorrelation, and the Breusch-Pagan test for homoscedasticity, thereby providing a comprehensive, scientifically robust evaluation of the model’s statistical reliability.
A Comprehensive data quality assessment was implemented to ensure model reliability and accuracy. Outlier detection was performed using a multi-step approach combining statistical and domain-specific criteria. Initially, the Interquartile Range (IQR) method was applied to identify statistical outliers, where values beyond Q1 - 1.5 $\times$ IQR and Q3 + 1.5 $\times$ IQR were flagged for further investigation [72].
Domain-specific validation criteria were established based on Indonesian pavement engineering standards: IRI values exceeding 12 m/km were considered implausible for toll road conditions, ESA values below 0.1 ESA/year or above 50 ESA/year per lane were flagged as potential data collection errors, and road age values inconsistent with toll road opening dates were verified against operational records.
Following best practices in pavement data analysis [73], outliers were handled in three steps: (1) verifying against original data sources and fixing any errors found in data entry, (2) keeping outliers that showed real extreme conditions (like construction zones or bad weather) with proper documentation, and (3) getting rid of confirmed erroneous data points that could not be verified or fixed.
After removing outliers, the final dataset kept 94.3% of the original observations. Of these, 3.2% were corrected through verification, and 2.5% were removed because they were found to be wrong. This careful method ensured the data were accurate while still allowing for fundamental differences in pavement conditions across the toll road network.
3. Results and Discussion
After converting traffic volume data to ESA, the modeling stage used MATLAB, which automatically partitioned the data into 70% for training, 15% for validation, and 15% for testing. This data division approach has been documented in several studies [56], [59], [60], [67], [74]. During training, the ANN uses a learning algorithm to find the best match for the input data, while validation prevents overfitting and determines optimal weights [75]. The testing set evaluates model accuracy by comparing predetermined outputs to actual values [56].
The optimization algorithm employed a systematic four-stage iterative approach to identify the optimal ANN architecture for IRI prediction, with progression to subsequent stages contingent upon achieving $R$-values exceeding 0.70. A feed-forward backpropagation network was selected due to several key advantages [76], [77], [78]: (i) multi-layer architecture enabling diverse activation functions for complex data patterns; (ii) effective learning process with computational efficiency dependent on error boundaries and epoch specification; (iii) enhanced pattern recognition accuracy through learning rate optimization; (iv) proven effectiveness in pattern recognition applications; and (v) versatile applicability across regression and classification problems.
The network employed Trainlm (Levenberg-Marquardt) as the training function, Learngdm (gradient descent with momentum) as the learning adaptation function, and MSE as the performance function [79], [80]. Trainlm serves as a fundamental backpropagation algorithm extensively used in diverse ANN applications. Learngdm adjusts network weights based on observed changes, with the learning rate parameter controlling the learning step magnitude—higher rates accelerate convergence but may compromise stability. MSE quantifies network performance by measuring the accuracy of a regression model in predicting numerical values. Table 4 summarizes the systematic progression across all optimization stages, detailing network configurations, optimization variables, and inherited parameters.
Element | Stage I: | Stage II: | Stage III: | Stage IV: |
Network Type | Feed-forward backpropagation | Feed-forward backpropagation | Feed-forward backpropagation | Feed-forward backpropagation |
Input Data | 2 nodes: | 6 nodes: | Same as Stage II | Same as Stage II |
Target Data | Annual IRI values | Annual IRI values | Annual IRI values | Annual IRI values |
Training Function | Trainlm (Levenberg-Marquardt) | Trainlm | Trainlm | Trainlm |
Learning Function | Learngdm (Gradient descent with momentum) | Learngdm | Learngdm | Learngdm |
Performance Function | Mean Squared Error (MSE) | MSE | MSE | MSE |
Network Architecture | 2 layers | 2 layers | 3 layers | 4 layers |
Layer 1 | Neurons: 2–6 (Variable) | Neurons: 2–50 (Variable) | Inherited from Stage II: | Inherited from Stage III: |
Layer 2 | Output layer | Output layer | Hidden layer | Inherited from Stage III: |
Layer 3 | - | - | Output layer | Hidden layer |
Layer 4 | - | - | - | Output layer |
Optimization Variables | • Layer 1 neurons (2–6) | • Layer 1 neurons (2–50) | • Layer 2 neurons | • Layer 3 neurons |
Inherited Parameters | None | Best TF combination from Stage I | • Layer 1 config from Stage II | • Layer 1–2 config from Stage III |
Selection Criterion | Highest $R$-value | Highest $R$-value | Highest $R$-value | Highest $R$-value |
The optimization process employed a four-stage iterative strategy (Figure 2), with each stage building upon the previous optimal configuration. Advancement required achieving $R$-values exceeding 0.70.

Stage I: Transfer Function Selection evaluated nine transfer function combinations using 2-node inputs (total ESA and road age). Table 5 presents all tested configurations. The logsig-logsig combination (Network 1) achieved optimal performance ($R$ = 0.7173, MSE = 0.0885), outperforming purelin and tansig alternatives, establishing the transfer function baseline for subsequent stages.
| Number of neurons layer 1 | Network | Transfer Function Layer 1 | Transfer Function Layer 2 |
| 2 | Network 1 | Logsig | Logsig |
| 2 | Network 2 | Logsig | Purelin |
| 2 | Network 3 | Logsig | Tansig |
| 2 | Network 4 | Purelin | Logsig |
| 2 | Network 5 | Purelin | Purelin |
| 2 | Network 6 | Purelin | Tansig |
| 2 | Network 7 | Tansig | Logsig |
| 2 | Network 8 | Tansig | Purelin |
| 2 | Network 9 | Tansig | Tansig |
The Stage II: Input Expansion enhanced the input structure to 6 nodes (ESA per vehicle class plus road age) while maintaining the logsig-logsig structure from Stage I. Testing 2-50 neurons identified the 6-20-1 configuration as optimal ($R$ = 0.7836, MSE = 0.0456), achieving a 19.8% improvement in $R$-value.
Stage III: Hidden Layer Addition introduced a second hidden layer, inheriting the first layer configuration from Stage II. The 6-25-20-1 architecture with logsig functions achieved $R$ = 0.8764 and MSE = 0.0234, representing an 11.9% improvement in $R$ and a 48.7% reduction in MSE.
Stage IV: Architecture Deepening extended to three hidden layers, testing purelin for the output layer to enable continuous IRI prediction. The optimal 6-30-25-20-1 configuration (logsig-logsig-logsig-purelin) achieved $R$ = 0.9554, MSE = 0.0153, RMSE = 0.1236. This represented a 9.0% improvement over Stage III and a 46.0% cumulative improvement from Stage I.
Table 4 and Table 6 provide comprehensive optimization details and progressive performance improvements across all stages, demonstrating the effectiveness of this structured parameter exploration approach.
Stage | Network Architecture | Transfer Functions | R-value | Mean Squared Error (MSE) | Root Mean Square Error (RMSE) | Training Time (min) |
I | 2-6-1 | logsig | 0.7173 | 0.0885 | 0.2975 | 8 |
II | 6-20-1 | logsig-logsig | 0.7836 | 0.0456 | 0.2135 | 15 |
III | 6-25-20-1 | logsig-logsig-logsig | 0.8764 | 0.0234 | 0.1530 | 32 |
IV | 6-30-25-20-1 | logsig-logsig-logsig-purelin | 0.9554 | 0.0153 | 0.1236 | 47 |
Figure 3 compares performance across configurations with their standard deviations. Config A demonstrates the best performance (Performance Score = 0.892, SD = 0.034) with excellent stability, followed by Config B (Performance Score = 0.736, SD = 0.041) and Config C (Performance Score = 0.654, SD = 0.038). Statistical analysis confirmed significant differences between configurations, with Config A showing notable improvement, strong model stability, and satisfied statistical assumptions.

Three representative configurations were selected for comparative analysis: Config A represents the final optimized architecture (6-30-25-20-1, Stage IV) with transfer functions [logsig-logsig-logsig-purelin]; Config B corresponds to Stage III (6-25-20-1) with [logsig-logsig-logsig]; and Config C represents Stage II (6-20-1) with [logsig-logsig]. This structured comparison enables systematic evaluation of progressive architectural refinement.
Statistical analysis revealed significant differences between configurations (F = 24.367, $p <$ 0.05). The extremely low $p$-value (3.15e-09) demonstrates that performance differences reflect genuine improvements in predictive capability rather than random variation, validating the importance of systematic parameter selection in ANN design for IRI prediction.
Pairwise comparisons revealed meaningful performance differences. The most significant improvement occurred between Config A and Config C (mean difference = 0.238, $p$ = 0.001), representing substantial gains in prediction accuracy for maintenance scheduling and resource allocation. Config B's intermediate performance (mean difference with Config A = 0.156, $p$ = 0.003) demonstrates that incremental architectural improvements yield measurable improvements in accuracy.
Large effect sizes ($d$ $>$ 0.8) between Config A and other configurations have practical implementation significance. Cohen’s $d$ = 1.234 (Config A vs. Config C) indicates the optimized configuration reduces prediction errors by approximately one standard deviation—potentially determining the difference between timely intervention and delayed maintenance, resulting in significant cost savings and improved road safety.
A comprehensive validation framework assessed model stability and generalizability through multiple approaches. Stratified 10-fold cross-validation yielded consistent performance: mean $R^2$ = 0.892 (SD = 0.034), RMSE = 0.125 (SD = 0.018), and MAPE = 0.0295 (SD = 0.008), with coefficient of variation = 3.8% for $R^2$. Individual fold performance ranged from $R^2$ = 0.851 to 0.926, demonstrating robust predictive capability.
Following the best practices outlined by Inkoom et al. [81] for pavement management applications, bootstrap validation (1000 resamples) yielded 95% confidence intervals: $R^2$ [0.868, 0.916], RMSE [0.108, 0.142], and MAPE [0.021, 0.036]. These narrow intervals validate the model’s reliability and provide practitioners with quantified uncertainty estimates for maintenance planning decisions [82].
Temporal validation (training: 2007–2019; testing: 2020–2024) showed $R^2$ = 0.874, RMSE = 0.138, and MAPE = 0.031, confirming the model maintains predictive accuracy despite evolving traffic patterns and aging infrastructure.
Comprehensive residual diagnostics confirmed model reliability through verification of essential statistical assumptions—normality (Shapiro-Wilk: W = 0.976, $p$ = 0.234), independence (Durbin-Watson: 1.987), and homoscedasticity (Breusch-Pagan: $p$ = 0.156)—demonstrating unbiased, consistent predictions across diverse IRI values and measurement sequences.
Practically, these validations ensure stable prediction accuracy across different road segments and time periods, supporting both immediate maintenance scheduling and long-term network planning. This proven reliability facilitates integration into existing PMS, providing transportation agencies with a robust decision-support tool for sustainable infrastructure management.
Table 7 shows how different ANN setups are used to predict the IRI. The ANN model proposed in this study uses the configuration 6-30-25-20-1. Figure 4 depicts the architecture of the ANN employed in this study to predict the IRI of flexible pavement. The network has one input layer, three hidden layers, and one output layer. There are six neurons in the input layer, corresponding to the model’s independent variables. The hidden layers are structured with 30, 25, and 20 neurons, respectively, each using the log-sigmoid (logsig) transfer function to capture non-linear relationships in the data.
ANN Configuration | Performance Parameters | Input and References |
6-30-25-20-1 | $R$ = 0.9554 $R^2$ = 0.9020 MSE = 0.0153 RMSE = 0.1236 MAPE = 0.0285 | ESA per class per year and road age (this research) |
10- 20-15-15-1 | $R^2$ = 0.9680 RMSE= 0.087 MAPE = 0.071 | Age, fatigue cracking, block cracking, edge cracking, longitudinal cracking, transverse cracking, patching, potholes, bleeding, raveling [83] |
10–14-10–10-1 | $R^2$ = 0.9910 and 0.9750 RMSE= 0.0210 and 0.0280 | Pavement age, rutting, fatigue cracking, block cracking, longitudinal cracking, transverse cracking, pot-holes, patching, bleeding, and ravelling [55] |
11-19-1 | $R^2$ = 0.9340 ASE = 0.0009 MARE = 4.8410 | Age, concrete pavement thickness, subbase thickness, average contraction distance, CESAL, subbase material type, climate region, and construction number [84] |
15-19-2 | $R^2$ = 0.9500 MARE = 5.7300 ASE = 0.0000031 | Initial Longitude and Latitude, Final Longitude and Latitude, Thickness, Section Length, Section Age in 2010, PCR in 2010, IRI in 2010, Time since 2010, Minor Rehabilitation, Major Rehabilitation, Equivalent Single Axle Load (ESAL), Cumulative Equivalent Single Axle Load (CESAL), PRE PCR, PRE IRI, IRI [85] |
7-9-9-1 | $R^2$ = 0.8280 RMSE = 0.010 MAPE = 0.01 0 | Mean Annual Air Temperature, Annual Average Freezing Index, Annual Average Maximum and Minimum Humidity, Annual Average Precipitation, Annual Average Daily Traffic, dan Annual Average Daily Truck Traffic [4] |

The output layer consists of a single neuron with a purelin (linear) transfer function, suitable for continuous output prediction, yielding $R$ = 0.9554, $R^2$ = 0.9020, MSE = 0.0153, RMSE = 0.1236, and MAPE = 0.0285. While this appears less complex than recent studies such as Ali et al. [83], with their 10-20-15-15-1 configuration ($R^2$ = 0.9680), and Ali et al. [55], with their 10-14-10-10-1 configuration ($R^2$ = 0.9910), our model demonstrates significant efficiency by achieving satisfactory accuracy with fewer input parameters. Unlike other studies that require extensive pavement condition data (including various types of cracking, potholes, and surface defects) or environmental factors, such as in study by Hossain et al. [4], our model achieves reliable prediction capability using only ESA and road age data. This streamlined approach offers practical advantages in implementation, particularly in scenarios where detailed pavement condition data may be limited or costly to obtain. While numerically lower than some recent studies, the model's performance metrics represent a balanced compromise between model complexity and practical applicability, making it particularly suitable for routine road maintenance planning applications.
The achieved performance metrics demonstrate high practical accuracy for maintenance decision-making applications. The RMSE value of 0.1236 m/km indicates that predictions typically deviate by less than 0.13 m/km from actual IRI values. Given established IRI thresholds for pavement condition classification: Very Good ($<$2.86 m/km), Good (2.86–4.49 m/km), Fair (4.50–5.69 m/km), Mediocre (5.70–8.08 m/km), and Poor ($>$8.08 m/km) [16], [66], this prediction accuracy is highly suitable for maintenance decision-making.
The MAPE value of 0.0285 (2.85%) indicates excellent relative accuracy, well below the 5% threshold commonly accepted for pavement management applications [23], [86]. For practical maintenance planning, prediction errors of $\pm$0.13 m/km allow accurate classification of pavement condition categories, enabling timely intervention decisions. For example, a pavement with an actual IRI of 4.0 m/km (Good condition) would be predicted to be between 3.87 and 4.13 m/km, maintaining the same condition classification and appropriate maintenance timing.
The developed ANN model employs a 6-30-25-20-1 configuration (Figure 4), achieving $R^2$ = 0.9020, RMSE = 0.1236 m/km, and MAPE = 0.0285 using only ESA and road age inputs. This performance compares favorably with recent ANN studies while demonstrating practical advantages through simplified data requirements.
Contemporary ANN applications for IRI prediction typically achieve higher $R^2$ values through extensive input parameters. Ali et al. [83] reported $R^2$ = 0.9680 (RMSE = 0.087) using ten distress indicators, including various cracking types, potholes, and surface defects, while Ali et al. [55] achieved $R^2$ = 0.9910 using similar comprehensive inputs. Although numerically lower, our model’s $R^2$ = 0.9020 represents acceptable accuracy for maintenance planning while requiring substantially fewer parameters, reducing annual data collection costs by an estimated 60–75% compared to comprehensive pavement condition surveys.
Traditional pavement management approaches employ mechanistic-empirical or empirical regression models. The American Association of State Highway and Transportation Officials (AASHTO) Mechanistic-Empirical Pavement Design Guide (MEPDG) integrates multi-layer elastic theory with empirical transfer functions. According to the National Academies implementation study [87], MEPDG IRI prediction models typically achieve $R^2$ values ranging from moderate to good for asphalt pavements, with comparable performance for jointed plain concrete pavements. Well-calibrated regional implementations can demonstrate improved accuracy [88], [89], though requiring detailed inputs including hourly climate data, complete layer properties, and comprehensive traffic spectra. Empirical models such as the World Bank’s Highway Development and Management (HDM-4) system use polynomial or exponential functions calibrated to regional data. Studies on HDM-4 calibration for local conditions report varying accuracy levels; Thube [90] achieving reasonable predictions after calibration for low-volume roads in India, though roughness progression models generally demonstrate moderate predictive performance. These empirical approaches require moderate input parameters but demonstrate limited capacity to capture the complex nonlinear interactions inherent in pavement deterioration processes.
The ANN approach offers distinct advantages over these traditional methods. Unlike empirical models that require predetermined functional forms, ANNs learn complex nonlinear relationships directly from data without a priori assumptions about deterioration patterns [91]. Gharieb et al. [74] demonstrated ANN models for Double Bituminous Surface Treatment (DBST) and Asphalt Concrete (AC) pavements in Laos, achieving $R^2$ values of 0.96 and 0.94, respectively, and significantly outperforming Multiple Linear Regression models developed under identical conditions. Compared to MEPDG’s mechanistic foundation, the data-driven ANN provides comparable or superior predictive accuracy with minimal input requirements and computational efficiency, making it suitable for network-level analysis [24]. This positions the developed model as a practical middle-ground solution: more accurate than basic empirical regression through non-linear learning capability, yet more implementable than mechanistic-empirical approaches through streamlined data requirements.
However, important trade-offs exist. Mechanistic-empirical models provide transparent cause-and-effect relationships valuable for design optimization and forensic investigations, while ANN function as interpretive models with limited mechanistic insight [70]. MEPDG excels in evaluating structural alternatives during design phases, whereas the ANN framework focuses on network-level condition prediction for routine maintenance planning. These approaches serve complementary rather than competing purposes within comprehensive PMS.
The RMSE accuracy of 0.1236 m/km enables reliable pavement condition classification within established Indonesian IRI thresholds: Very Good ($<$2.86 m/km), Good (2.86–4.49 m/km), Fair (4.50–5.69 m/km), Mediocre (5.70–8.08 m/km), and Poor ($>$8.08 m/km) [18], [69]. Prediction errors of $\pm$0.13 m/km enable accurate maintenance decision-making, supporting timely intervention while avoiding premature repairs. This precision is particularly valuable for toll road operations, where maintenance scheduling directly affects traffic flow and user satisfaction [24].
Parameter selection criteria are methodically designed to optimise model performance. The number of neurons is determined by the complexity of the input features, with parsimony principles to avoid overfitting. The transfer function selection is based on the data distribution characteristics, considering the range of values and the non-linear pattern. The learning algorithm is selected based on computational efficiency and convergence speed, and adjusted to the problem’s complexity and the dataset’s size.
The optimisation approach uses a hierarchical framework to investigate network configurations systematically. Beginning with a basic design, the model’s complexity increases progressively as performance improves. Every stage in the optimisation hierarchy ensures efficient parameter optimisation by balancing the trade-off between model complexity and performance improvement.
While the optimized model demonstrates strong predictive performance (RMSE = 0.1236 m/km, MAPE = 0.0285), understanding potential sources of prediction variability provides critical insights for practical implementation and model interpretation.
Sources of Prediction Variability: Prediction uncertainty arises from multiple sources across four main categories. Traffic-related variability includes: vehicle classification errors at class boundaries ($\pm$5–8% variation in ESA calculations), seasonal overloading variations (higher during peak travel periods such as Ramadan homecoming, Christmas, and New Year holidays), and lane distribution changes during construction or incidents that deviate from assumed factors (Table 3).
Temporal and environmental factors encompass: seasonal deterioration acceleration during wet periods due to moisture-induced subgrade weakening, non-linear material aging, particularly in pavements exceeding 15 years, and year-to-year climate variations (exceptional rainfall or extended dry periods) not fully captured by annual data aggregation.
Construction and maintenance history affects accuracy through field variations in construction quality (compaction, layer thickness, material properties), undocumented minor maintenance activities (crack sealing, surface treatments) that create apparent deterioration slowdowns, and localized distress from utility cuts or drainage issues. Model boundary limitations include increased uncertainty for: extreme traffic loads exceeding training data range ($>$40 ESA/year per lane), early-age pavements ($<$2 years) dominated by compaction settlement effects, and severely aged pavements ($>$20 years) with accelerated non-linear deterioration.
Practical Implications: These error sources inform three key application strategies. First, practitioners should apply $\pm$0.25 m/km confidence bands for planning purposes, particularly for pavements approaching maintenance intervention thresholds. Second, periodic model recalibration using recent condition survey data (every 3–5 years) maintains prediction accuracy as network characteristics evolve. Third, segments with extreme ESA, exceptional climate exposure, or approaching critical thresholds warrant more frequent condition verification to supplement model predictions and ensure timely maintenance decisions.
The model’s prediction accuracy directly supports operational maintenance decisions through alignment with standard IRI-based intervention thresholds. International pavement management practice establishes explicit action triggers: routine maintenance (IRI $<$ 2.0 m/km), preventive maintenance including thin overlays (2.0–4.0 m/km), corrective rehabilitation (4.0–8.0 m/km), and major reconstruction (IRI $\geq$ 8.0 m/km). With RMSE = 0.1236 m/km, the model provides sufficient precision to classify pavement condition reliably within these categories, with misclassification risk limited to $\pm$0.25 m/km around threshold boundaries.
For PMS implementation, road agencies can deploy the model for network-level planning through three practical applications. First, annual screening identifies segments approaching intervention thresholds within 1–3-year planning horizons, enabling proactive budget preparation and contractor mobilization. Second, intervention timing optimization compares life-cycle costs of alternative maintenance strategies—for example, preventive treatment when predicted IRI reaches 3.5 m/km versus deferred action until 5.0 m/km—considering treatment costs, deterioration rates, and traffic impacts. Third, prioritization frameworks rank segments by predicted IRI exceedance and traffic importance, facilitating optimal resource allocation under budget constraints. The model’s computational efficiency and minimal data requirements enable rapid deployment across entire road networks, supporting both immediate tactical decisions and long-term strategic planning in resource-constrained environments.
The developed ANN model demonstrates significant potential for practical implementation in PMS. The streamlined input requirements (ESA and road age) facilitate integration with existing toll road monitoring infrastructure, requiring minimal additional data collection efforts. The model’s computational efficiency enables real-time applications and routine maintenance planning across extensive road networks.
Implementation Benefits: The systematic optimization approach provides several practical advantages: a standardized methodology enabling consistent application across different toll road segments, reproducible results supporting decision accountability, computational efficiency suitable for resource-constrained environments, and a scalable framework that accommodates network expansion.
Future Research Directions: Several promising research avenues emerge from this study: (1) integration of environmental factors (temperature fluctuations, precipitation patterns) to enhance model robustness for climate-sensitive regions, (2) development of multi-objective optimization incorporating cost-benefit analysis for comprehensive maintenance planning, (3) extension to rigid pavement applications with appropriate parameter modifications, (4) implementation of ensemble methods combining multiple ANN architectures for enhanced prediction accuracy.
Real-time data integration and geographic adaptability represent important opportunities for future model enhancement. Advanced validation techniques, including ensemble approaches and uncertainty quantification, could further improve model reliability and practical applicability.
This model is optimized for Indonesian toll road flexible pavements with ESA values of 5–40 ESA/year per lane and pavement ages 2–20 years; caution is needed for extreme traffic conditions, very young/old pavements, and regions with highly variable climatic conditions not represented in the training data. The absence of environmental variables (temperature, precipitation, freeze-thaw cycles) is a key limitation, though the minimal-data approach enables practical deployment when comprehensive monitoring infrastructure is unavailable. The model performs best for network-level screening and multi-year maintenance planning under typical operational conditions with systematic traffic monitoring; segments approaching critical thresholds or experiencing exceptional conditions warrant supplementary field verification with $\pm$0.25 m/km confidence bands.
4. Conclusions
This study develops a systematic algorithm for optimizing ANN parameters in IRI prediction models for toll road pavement management. Through comprehensive analysis and rigorous validation, the following key findings demonstrate how this framework directly supports transportation infrastructure management:
1. The optimization algorithm refines ANN parameters through four iterative stages with statistical validation. The 6-30-25-20-1 configuration achieved strong performance metrics ($R$ = 0.9554, $R^2$ = 0.9020, MSE = 0.0153, RMSE = 0.1236, MAPE = 0.0285). This accuracy enables reliable pavement condition classification within established IRI thresholds and supports three infrastructure management applications: (1) Long-term Planning: 5–10-year condition forecasting for multi-year budget allocation; (2) Resource Optimization: 60–75% reduction in data collection costs through minimal input requirements; (3) Network-Level Support: simultaneous analysis of hundreds of road segments for prioritizing maintenance activities within budget constraints.
2. Model Architecture Optimization: The effective 6-30-25-20-1 configuration achieved an optimal balance between architectural complexity and predictive accuracy through the systematic application of feed-forward backpropagation with Trainlm and Learngdm functions. Strategic selection of transfer functions (logsig-logsig-logsig-purelin) enhanced model performance while maintaining computational efficiency. The optimization process explored 720 parameter combinations, demonstrating a thorough investigation of the parameter space.
3. Comparative Performance: Relative to contemporary research, the model demonstrates commendable accuracy using minimal input parameters, achieving practical efficiency through the utilization of only ESA and road age data. This streamlined approach proves effective across varied contexts, particularly valuable when comprehensive pavement condition data collection is resource-intensive or challenging.
4. Statistical Validation: Statistical analysis confirms model improvements, including ANOVA results ($F$ = 24.367, $p$ $<$ 0.05) showing meaningful differences between configurations. Cross-validation results (mean $R^2$ = 0.892, SD = 0.034) demonstrate consistent performance across diverse datasets, while residual analysis validates model assumptions and reliability measures. Bootstrap validation (1000 resamples) with narrow 95% confidence intervals [$R^2$: 0.868–0.916] further confirms model stability.
5. Implementation Value: The developed algorithm provides practical solutions for road maintenance planning by minimizing data collection requirements while maintaining prediction accuracy. Consistent IRI forecasting enables proactive maintenance scheduling, potentially yielding substantial operational cost savings while preserving infrastructure quality. The computational efficiency (execution time $<$2 hours) and minimal software dependencies facilitate deployment across resource-constrained environments.
6. Future Adaptability: The algorithm framework exhibits adaptability to flexible pavement applications with considerable potential for integrating environmental parameters. Advanced validation methodologies, including bootstrapping and ensemble techniques, ensure robust performance across diverse operational contexts and provide a foundation for future research in PMS.
7. Limitations and Future Research: While acknowledging limitations in environmental factor inclusion (temperature variation, precipitation, freeze-thaw cycles), the current model prioritizes practical implementability and data availability. Future research should investigate: (1) integration of climate variables through hierarchical modeling or regional calibration factors, (2) extension to rigid pavement applications with appropriate parameter modifications, (3) development of ensemble methods combining multiple ANN architectures, and (4) implementation of real-time data integration for dynamic prediction updates.
This research makes significant contributions to transportation engineering by providing a systematic, validated methodology for IRI prediction that balances accuracy with practical implementation feasibility. The algorithm’s demonstrated effectiveness and flexible framework establish a strong foundation for advancing PMS globally.
Conceptualization, L.L.; methodology, L.L.; software, L.L.; validation, L.L. and J.U.D.H; formal analysis, L.L.; investigation, L.L.; resources, L.L.; data curation, L.L. and J.U.D.H; writing—original draft preparation, L.L.; review and editing, L.L., M.A.W., and J.U.D.H; visualization, L.L.; supervision, M.A.W.; project administration, L.L.; funding acquisition, M.A.W. All authors have read and agreed to the published version of the manuscript.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
