Deep Learning for Sustainable Waste Management Through Leachate Volume Prediction with Multilayer Perceptron
Abstract:
Landfill leachate poses a major challenge to urban waste management, particularly in tropical regions with high rainfall and heterogeneous waste composition. This study developed an artificial neural network (ANN) based on a multilayer perceptron (MLP) architecture to predict leachate volume at the Supit Urang landfill in Malang City, Indonesia. The dataset combined primary measurements of leachate discharge with secondary meteorological and environmental data, including rainfall, temperature, humidity, wind, and waste volume. Data preprocessing involved cleaning, imputation, transformation, and normalization to improve data quality and model readiness. The ANN model used two hidden layers with 64 neurons each and was optimized with the Adam algorithm, early stopping, and L2 regularization to balance predictive accuracy and generalization. The model achieved an R$^2$ of 0.61 and correlation coefficients above 0.82, indicating a good ability to capture nonlinear relationships and overall leachate trends. However, the relatively high root mean square error (RMSE) values showed that individual predictions still deviated substantially from observed values. Overall, the findings indicate that ANN models are promising decision-support tools for sustainable landfill management, although further improvements in data quality and model optimization are still required. The study also offers practical insight for estimating leachate generation and planning treatment strategies in urban landfills.
1. Introduction
Waste management has become a major global environmental challenge, particularly in rapidly urbanizing regions where population growth, economic development, and changing consumption patterns increase waste generation [1], [2], [3], [4], [5], [6]. Municipal solid waste (MSW) continues to grow, while the availability of sanitary and environmentally sound disposal facilities remains limited [7], [8], [9]. In many developing countries, including Indonesia, final disposal sites face serious capacity and operational constraints. One of the most critical environmental byproducts of these landfills is leachate, a complex and highly contaminated liquid generated by the percolation of rainwater and moisture through waste piles [10], [11], [12]. Uncontrolled leachate threatens surface water and groundwater quality, contributes to ecosystem degradation [13], [14], [15], [16], [17], [18], [19], [20], causes odor problems, and poses serious public health risks [21], [22], [23].
Over the past two decades, numerous studies have examined leachate generation and treatment. Traditional approaches have typically relied on empirical or semi-empirical models based on hydrological balance, waste composition, and meteorological factors. For example, the U.S. Environmental Protection Agency (EPA) [21], [24], [25], [26], [27] and subsequent international studies developed water-balance methods to estimate leachate generation [28], [29], [30]. Similarly, landfill water-balance models such as the Hydrologic Evaluation of Landfill Performance (HELP) and the Leaching Estimation and Chemistry Model (LEACHM) have been widely applied [31], [32], [33]. However, these models often require extensive site-specific parameters, depend on simplifying assumptions, and struggle to capture the nonlinear and dynamic nature of leachate formation. As a result, their predictive accuracy in complex tropical environments such as Indonesia remains limited.
Recent advances in machine learning (ML) and artificial intelligence (AI) provide new opportunities to address these limitations. Previous studies have applied ML algorithms—including Random Forest, Support Vector Machines, and Artificial Neural Networks (ANNs)—to model various aspects of solid waste management, from waste generation forecasting to landfill gas emissions [34], [35], [36], [37], [38]. In the specific context of leachate prediction, several studies have shown that ANNs can capture nonlinear relationships among rainfall, waste characteristics, and landfill age [39], [40]. However, most existing studies focus on treatment efficiency or water-quality parameters, such as COD, BOD, and ammonia removal, rather than directly modelling leachate volume dynamics [41], [42], [43], [44]. In addition, earlier research has been dominated by case studies from temperate regions, with relatively limited attention to tropical landfills characterized by high rainfall, mixed waste composition, and open-dumping practices.
This study addressed that gap by applying a Multilayer Perceptron (MLP), a widely used ANN architecture, to predict leachate volume at an Indonesian landfill. By integrating historical rainfall, landfill characteristics, and waste-related data, the study developed a predictive model capable of capturing nonlinear relationships more effectively than conventional statistical or empirical approaches. The novelty of this work lies not only in its focus on leachate volume prediction in a tropical developing-country context, but also in demonstrating how deep learning can be translated into a practical decision-support tool for landfill management.
Previous applications of data-driven models to landfill leachate have reported high predictive performance, but mostly for water-quality parameters rather than leachate quantity. For example, Azadi et al. showed that ANN and PCA–M5P models could predict leachate COD load in lab-scale landfills with mean absolute percentage errors of about 4% and 12% and correlation coefficients above 0.98 [45]. Alizamir et al. further demonstrated that a hybrid ELM–GWO model yields lower root mean square error (RMSE) and mean absolute error (MAE) than single-stage ML models when predicting COD, BOD, turbidity, and electrical conductivity in landfill leachate and groundwater [46]. More recently, Ishii et al. reported R$^2$ values approaching 1.0 for daily leachate quantity prediction using a long short-term memory (LSTM) model at a temperate landfill site [47]. By contrast, only a few studies have directly addressed leachate volume prediction in tropical landfills with mixed waste and open dumping practices. Situating our model alongside these benchmarks helps clarify its relative strengths and limitations and highlights the need for context-specific performance evaluation.
This study had three objectives: (1) to construct and validate an MLP-based predictive model for landfill leachate volume; (2) to evaluate its performance against conventional approaches; and (3) to provide insights for sustainable landfill operation and policy-making. The contribution is both theoretical and practical. Theoretically, the study extends the literature on ML applications in waste management by addressing the underexplored topic of leachate volume prediction. Practically, it offers landfill operators and policymakers a tool for planning, monitoring, and mitigating environmental risks associated with leachate, thereby supporting broader goals of sustainable waste management and environmental protection.
2. Methodology
This study was conducted in several stages: data collection, data preprocessing, ANN model development using the backpropagation algorithm, model evaluation, and validation and interpretation (see Figure 1). Each stage was designed systematically to generate accurate and practically useful predictions of leachate volume for sustainable landfill management.
Data were collected through a combination of primary measurements, secondary sources, and field observations. The study was conducted at the Supit Urang landfill in Malang City, East Java, Indonesia (South Pandan, Pandanlandung, Wagir District, Malang Regency, East Java 65158). The site has operated as the main MSW disposal facility since the early 1990s and serves Malang City and surrounding districts. Supit Urang is located in a humid tropical monsoon climate with distinct wet and dry seasons, and annual rainfall typically exceeds 2,000 mm. Landfill management records show that daily waste input generally ranges from 600 to 800 tons and is dominated by organic household waste (about 60%–70%), with smaller fractions of plastics, paper, and inert materials. The active landfill cells cover several hectares and are managed using an improved sanitary landfill system with periodic soil cover. These climatic and operational conditions favor intensive leachate generation, making Supit Urang a representative site for testing ANN-based leachate volume prediction. Primary measurements provided high-resolution records of leachate dynamics over specific time intervals and formed the core of the dataset.
Secondary data were obtained from relevant institutions, including the Environmental Agency, the Meteorology, Climatology, and Geophysics Agency, and the landfill management unit. These data included daily rainfall, ambient temperature, humidity, and daily waste volume, as well as physicochemical characteristics of leachate such as COD, BOD, and pH. The secondary datasets expanded temporal coverage and complemented the primary measurements.

Field observations were also conducted to understand leachate flow patterns, landfill cell conditions, and surrounding environmental factors that could not be fully captured by quantitative records. By integrating these three sources, the study established a comprehensive and representative dataset for ANN training and testing.
Before model training, the raw data underwent several preprocessing steps to improve quality, consistency, and interpretability. The first step was data cleaning, during which missing or anomalous values were identified and corrected. For example, rainfall values recorded as “8888” were treated as missing values and temporarily replaced with zero.
The second step was data imputation, in which missing or zero values were replaced with the monthly average rainfall to preserve representative distributions. This was followed by data transformation, including the calculation of monthly rainfall totals, estimation of the proportional contribution of daily rainfall, and redistribution of monthly leachate volumes into daily values based on rainfall proportions.
In addition, categorical features such as wind direction were encoded. One-hot encoding was used to convert text labels (e.g., N, E, SW, and W) into binary numerical variables suitable for ANN input. The dataset was then split into two subsets, with 70% used for training and 30% for testing to allow independent performance evaluation. Finally, scaling was performed using the StandardScaler function so that all numerical features shared a comparable range, thereby reducing scale-related bias and improving model convergence.
The predictive model was developed using an MLP architecture trained with the backpropagation algorithm. The architecture consisted of three main layers. The input layer included predictor variables such as rainfall, daily waste input, ambient temperature, humidity, and other environmental parameters. The hidden layers captured nonlinear interactions among these variables, and the number of neurons was optimized experimentally to obtain the best performance. The output layer generated the predicted leachate volume [45], [46], [47], [48].
ReLU activation functions were used in the hidden layers to address vanishing-gradient problems, whereas a linear activation function was applied in the output layer for continuous prediction. The training process involved weight and bias initialization, forward propagation to generate predictions, error calculation using mean squared error as the loss function, backpropagation to update weights through gradient descent, and iterative optimization across epochs until convergence was reached [49], [50].
Model performance was assessed using three metrics: RMSE, coefficient of determination (R$^2$), and the correlation coefficient. RMSE measured the average deviation between predicted and observed values, with lower values indicating higher predictive accuracy. R$^2$ measured the proportion of variance in the observed data explained by the model, with values closer to 1 indicating stronger predictive performance.
The correlation coefficient was used to assess the consistency of patterns between predicted and observed values. Although it does not directly quantify prediction error, it provides insight into the model’s ability to reproduce observed trends. To highlight the advantages of deep learning, the ANN results were also compared with those of conventional approaches such as linear regression.
To ensure robustness and reduce overfitting, cross-validation techniques were applied. This procedure allowed the model to be evaluated across multiple data folds and improved its generalization capability. The predicted outputs were then interpreted in the operational context of landfill management, including the estimation of treatment-unit capacity, planning of leachate storage requirements, and design of strategies to mitigate environmental risks.
Through these steps, the study contributed methodologically to the application of deep learning in leachate prediction while also generating practical insight for more sustainable waste management in urban landfills.
3. Results
Table 1 presents sample observations from the dataset used in this study, combining meteorological and environmental attributes. The meteorological variables included minimum, maximum, and average temperature, relative humidity, daily rainfall, sunshine duration, and wind speed and direction. The environmental attribute was leachate volume, which served as the target variable. In the sample data, daily rainfall ranged from 4.4 to 23.5 mm, while relative humidity remained high at approximately 87%–88%. Observed leachate volumes varied widely, from about 33 m$^3$ to more than 176 m$^3$, reflecting the dynamic interaction among meteorological conditions, landfill operations, and waste composition at the study site [30], [46], [47], [51].
Attribute | Description | Sample Data |
|---|---|---|
Tn | Minimum temperature | 23.8 22.9 22.4 |
Tx | Maximum temperature | 31 30.8 30.2 |
Tavg | Average temperature | 25.9 25.4 25.7 |
RH_avg | Average humidity | 87 88 87 |
RR | Rainfall | 8.5 4.4 23.5 |
ss | Duration of sunshine | 5.1 1.6 1.6 |
ff_x | Maximum wind speed | 1 3 2 |
ddd_x | Wind direction at maximum speed | 311 112 217 |
ff_avg | Average wind speed | 1 1 1 |
ddd_car | Most frequent wind direction | SW E W |
Leachate | Liquid from the percolation of landfill waste | 63.6679 32.9575 176.0230224 |
These descriptive statistics underscore the complexity of leachate generation. Rather than depending on a single parameter, leachate volume appears to be shaped by a combination of climatic factors, waste inputs, and environmental conditions, highlighting the need for a modelling approach capable of capturing nonlinear interactions.
The ANN developed in this study was implemented as an MLPRegressor with two hidden layers, each containing 64 neurons (Table 2). The tanh activation function was selected because it transforms normalized input data into the range [-1,1], which can improve generalization. Training was performed using the Adam optimizer, which adaptively adjusted the learning rate to support efficient convergence.
Parameter | Value |
|---|---|
hidden_layer_sizes | (64, 64) |
activation | tanh |
solver | adam |
alpha | 0.001 |
learning_rate_init | 0.005 |
learning_rate | adaptive |
batch_size | 128 |
max_iter | 5000 |
random_state | 42 |
early_stopping | True |
n_iter_no_change | 60 |
Regularization and early stopping were applied to prevent overfitting and to maintain robustness when the model was exposed to unseen data. Training was allowed to run for a maximum of 5000 iterations, although early stopping was triggered when no improvement was observed within 60 iterations. These strategies enabled the ANN to capture nonlinear relationships between meteorological variables and leachate volume while balancing predictive accuracy and generalization.
The model-evaluation results are summarized in Table 3. The ANN achieved an RMSE of 82.42 on the training set and 35.71 on the testing set. This reduction suggests that the model generalized reasonably well to unseen data. Likewise, the R$^2$ values were similar for the training (0.6027) and testing (0.6125) sets, indicating that approximately 61% of the variance in leachate volume was explained by the input variables.
Metric | Training | Testing |
|---|---|---|
Root mean square error (RMSE) | 82.42 | 35.71 |
R$^2$ | 0.6027 | 0.6125 |
Correlation | 0.8546 | 0.8212 |
The correlation coefficients of 0.8546 for training and 0.8212 for testing indicate a strong linear relationship between predicted and observed values. Although some predictions deviated substantially from the observed data, the ANN was generally able to capture the rising and falling trends in leachate volume. Figure 2 and Figure 3 further illustrate model performance. Some points in the training set were predicted less accurately, whereas the testing set showed closer agreement between predicted and observed values, supporting the model’s generalization ability.

In Figure 2, the ANN predictions closely follow temporal fluctuations in leachate volume, particularly the broader seasonal pattern of higher flows during wetter periods and lower flows during drier periods. Peak leachate events were generally captured, although some extreme spikes were slightly under- or overestimated, reflecting the difficulty of reproducing short-lived events that may have been influenced by site operations in addition to rainfall. Figure 3 shows a similar pattern for the testing dataset: the model reproduced most rising and falling limbs of the leachate hydrograph, with a short response lag between periods of intense rainfall and subsequent increases in leachate volume. These diagnostics indicate that the ANN captured the dominant dynamics of leachate generation even though some point-level predictions remained imperfect.

Several methodological strengths help explain these findings. First, the integration of primary and secondary data produced a robust dataset that reflected real landfill conditions. Second, preprocessing steps—including imputation, normalization, and categorical encoding—improved input quality for model training. Third, the use of adaptive optimization, regularization, and early stopping helped the ANN balance predictive accuracy and generalization, as reflected in the comparable performance of the training and testing datasets.
4. Discussion
The findings address the central research question of whether a deep-learning approach, specifically an MLP-based ANN, can reliably predict leachate volume at a tropical landfill site. The results show that the proposed model was able to identify and generalize nonlinear relationships among meteorological and landfill-related variables, explaining more than 60% of the variance in leachate generation. Although point-level prediction errors remained substantial, the consistency of overall trends and the strong correlation values indicate that ANN is a viable tool for supporting landfill leachate management [39], [52], [53].
From an operational perspective, an R$^2$ of about 0.61 and correlation coefficients above 0.82 suggest that the model is sufficiently accurate for medium-term planning decisions, such as sizing storage ponds, estimating pump capacity, and anticipating periods of elevated leachate generation, because it reproduces overall trends and seasonal peaks. However, the relatively high RMSE, especially during extreme events, indicates that the model should not yet be used as a stand-alone tool for real-time control of leachate discharge. In practice, operators should apply conservative safety factors when translating predicted volumes into design or operational set points. At this stage, the ANN is therefore better viewed as a screening-level decision-support tool than as a precise day-to-day forecasting system.
Compared with ANN and hybrid models developed to predict leachate COD or BOD, which often report correlation coefficients above 0.95 and single-digit percentage errors [45], [54], as well as LSTM-based approaches that can achieve R$^2$ values close to 1.0 for leachate quantity prediction at well-instrumented temperate landfills [51], the performance reported here is more modest. This difference is likely attributable to the greater stochasticity of leachate volume, the relatively short and noisy dataset available at Supit Urang, and the absence of detailed operational variables, such as pumping schedules and temporary storage conditions, in the input space. Nevertheless, the model fills an important gap by offering an operationally useful, although not yet fully optimized, predictive tool tailored to a tropical landfill context.
These findings reveal both the strengths and the limitations of the model. On the one hand, the ANN captured broad patterns in leachate dynamics, showing that rainfall, humidity, and waste input jointly influence leachate generation. This is consistent with earlier studies emphasizing the role of hydrological balance and waste decomposition in leachate production, while extending the field by applying deep learning specifically to volume prediction in a tropical landfill context [30], [46], [55]. On the other hand, the relatively high RMSE values suggest that localized or episodic factors—such as sudden heavy rainfall, changes in waste composition, or landfill operating practices—may not yet be fully represented by the current model configuration [39].
When considered alongside the existing literature, these results reinforce the potential of ANN approaches in solid waste management. Earlier studies based on traditional hydrological balance models, such as HELP and LEACHM, often reported strong sensitivity to parameter specification and limited accuracy under tropical conditions [31], [33], [56], [57]. More recent ML-based studies have shown improvements in predicting waste-related outputs, although only a few have focused specifically on leachate volume [46], [48], [54], [58]. The present study therefore helps fill this gap by providing empirical evidence that ANN can extend predictive modelling into an area that is not well served by conventional techniques.
Beyond confirming previous knowledge, the study also has theoretical and practical implications. Theoretically, it shows that deep learning can complement established hydrological models by accommodating nonlinear relationships and complex interactions without extensive site-specific calibration. Practically, it provides landfill operators with a predictive tool that can support operational planning, including the estimation of treatment-unit capacity, the design of leachate storage systems, and the anticipation of peak flows during rainy periods. These applications are directly relevant to reducing environmental risks associated with uncontrolled leachate discharge.
Nevertheless, further refinement is needed. Improvements in data quality—particularly through higher-resolution rainfall measurements, more detailed waste-composition data, and real-time monitoring systems—would likely improve model accuracy. In addition, future studies could explore alternative architectures such as LSTM networks, which are better suited to capturing temporal dependencies and may further improve predictive performance.
Overall, this research shows that ANN-based models can make a meaningful contribution to sustainable landfill management. By capturing complex relationships among meteorological and environmental variables, the model contributes both to the academic literature and to the practical toolkit of waste-management practitioners, while also identifying directions for continued methodological improvement.
5. Conclusions
This study developed and applied an ANN with an MLP architecture to predict landfill leachate volume by integrating meteorological and environmental variables. The model captured nonlinear interactions among rainfall, temperature, humidity, wind, and waste-related parameters, and its R$^2$ and correlation values indicated that it explained more than half of the variance in leachate volume while reproducing the overall trend of leachate generation. Despite these strengths, the relatively high RMSE values show that individual predictions still deviated substantially from observed values, reflecting both the complexity of leachate dynamics and the limitations of the available dataset and parameter configuration. These findings suggest that ANN-based approaches can serve as useful decision-support tools for anticipating leachate production, planning treatment capacity, and mitigating environmental risks. At the same time, the model’s performance remains strongly dependent on data quality, monitoring frequency, and architectural optimization. Accordingly, this study contributes to the growing literature on deep learning in environmental management while also offering practical insight for sustainable landfill operation in urban settings.
Future research should improve data resolution and diversity by incorporating real-time meteorological monitoring, more detailed waste-composition records, and operational landfill-management parameters. In addition, the use of more advanced architectures, such as LSTM networks or hybrid deep learning–hydrological models, may improve temporal prediction accuracy. Broader validation across multiple landfill sites and integration into decision-support systems would further strengthen the practical value of this approach for addressing waste-management and environmental-protection challenges in the digital era.
Conceptualization, A.M. and S.S.; data curation, A.M.; formal analysis, A.M. and S.B.; visualization, A.M. and S.S.; writing—original draft, A.M.; methodology, S.S. and S.B.; resources, S.S.; software, S.S.; validation, S.S. and S.B.; writing—review & editing, S.S., S.S., and S.B.; supervision, S.S.; project administration, S.S.; funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors sincerely thank the Ministry of Higher Education, Science and Technology of the Republic of Indonesia for funding this research. The authors also thank the management of the Supit Urang Landfill, Malang City, for granting access and permission to conduct field measurements and data collection. Appreciation is also extended to Universitas Muhammadiyah Malang for providing academic and logistical support that facilitated the implementation of this study.
The authors declare no conflict of interest.
