Grey Clustering Based Air Quality Index to Detect Urban Air Quality in Lima
Abstract:
Urban air pollution remains a persistent challenge in the Global South, where rapid urbanization, limited monitoring infrastructure, and weak regulatory frameworks hinder effective environmental governance. In Lima, Peru—one of the most polluted capitals in Latin America—elevated PM2.5 and PM10 concentrations continue to pose serious threats to public health and sustainable urban development. Traditional Air Quality Index (AQIs), such as the U.S. EPA standard, often struggle to account for data uncertainty, pollutant interactions, and spatial heterogeneity. To address these gaps, this study introduces a novel AQI based on grey systems theory, applying a grey clustering framework enhanced with center-point triangular whitenization weight functions (CTWF). The model was specifically designed to handle ambiguous data and overlapping pollution categories. It was applied to daily PM2.5 and PM10 data from nine monitoring stations across metropolitan Lima, with validation conducted against both Peru’s national air quality standards and the U.S. EPA AQI. Results showed that the proposed index outperformed conventional methods under uncertain conditions, revealing critical spatial disparities often missed by traditional models. Beyond diagnostic accuracy, the index offers a scalable and transferable tool for urban planners and decision-makers to support targeted interventions, inform policy development, and advance Sustainable Development Goals—specifically SDG 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities).1. Introduction
Urban air pollution is one of the most pressing environmental and public health challenges of the 21st century. Among pollutants, particulate matter (PM2.5 and PM10) has been consistently linked to respiratory and cardiovascular diseases, reduced life expectancy, and increased mortality rates (Kebe et al., 2025; World Health Organization, 2021). These risks are particularly acute in cities of the Global South, where rapid urbanization and limited regulatory capacity exacerbate exposure to hazardous air quality.
Recent research has introduced diverse approaches to improve the assessment of urban air quality, including statistical forecasting in India (Yadav & Ganguly, 2025), artificial neural network model for prediction of air pollution index (Basir et al., 2025), hybrid neural networks in Iraq (Altahaan & Dobslaw, 2025), and integration of socio-demographic patterns with sensor-based measurements (Veres et al., 2025). While these methods provide valuable insights, they depend on dense monitoring networks as well as large and continuous datasets, rarely present in urban contexts with scarce resources. As a result, their applicability in cities with fragmented infrastructures, such as Lima, remains limited.
Conventional Air Quality Indices (AQIs), such as those of the United States Environmental Protection Agency (EPA, 2024) and the Ministry of Ecology and Environment in China (Qin et al., 2024), rely on rigid thresholds and dominance rules that prioritize single pollutants. These frameworks are effective for public communication but insufficient for urban planning as they cannot capture the interaction of pollutants, manage data gaps, and reflect spatial heterogeneity. These gaps underscore the need for more adaptive and uncertainty-resilient diagnostic tools.
Grey systems theory, specifically grey clustering with center-point triangular whitenization weight functions (CTWF), provides a robust framework to address these limitations. Unlike AI or purely statistical models, grey clustering does not require large datasets and can explicitly incorporate incomplete or limited information (Delgado et al., 2018; Delgado & Romero, 2017; Li, 2013). It is therefore particularly suitable for data-scarce cities, where pollutants interact in complex ways and monitoring systems are inconsistent.
Lima in Peru exemplifies the challenges of a megacity with severe pollution from PM2.5 and PM10, fragmented monitoring network, and high socio-environmental inequality. To address these conditions, this study developed and applied a novel AQI based on grey clustering with CTWF, using daily data from nine monitoring stations (Ministerio del Ambiente-SENAMHI, 2024). The results were validated against both Peruvian environmental standards (MINAM, 2017) and the U.S. EPA AQI (EPA, 2024), thus proving the capacity of the model to generate nuanced classifications and actionable insights. More broadly, the proposed index contributed to Sustainable Development Goal (SDG) 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities) by offering a scalable decision-support tool for urban environmental governance.
2. Literature Review
Urban air quality has become a central theme in Environmental Science due to its critical impact on public health and sustainable city planning. Conventional QIs, while widely used, are often criticized for their rigid classification rules and limited capability to integrate multi-pollutant dynamics or capture uncertainty under data-sparse conditions. In response, recent research has explored a variety of statistical, AI-driven, and grey system approaches to enhance pollutant assessment and decision-making in complex urban environments.
Statistical models continue to play a vital role in air quality assessment, particularly in urban regions with expanding but incomplete monitoring infrastructures. Zhang et al. (2025a) introduced an innovative real-time monitoring system utilizing drone-mounted mass spectrometry to achieve high spatiotemporal-resolution mapping of air pollutants in complex urban environments. Their system provided enhanced capabilities for detecting localized emission sources and transient pollution events, thus contributing significantly to urban air quality surveillance (Granella et al., 2024; Kazemi et al., 2025).
Istiana et al. (2023) further explored data-driven methodologies by conducting a causality analysis between air quality parameters and meteorological conditions in Jakarta. Their findings confirmed strong correlations between PM2.5 concentrations and meteorological variables such as humidity and wind speed, thus emphasizing the potential for incorporating climate data into predictive pollution models to improve their reliability and responsiveness under dynamic environmental conditions (Istiana et al., 2023; Tume-Bruce et al., 2022).
Though applying in a different domain, Garini et al. (2025) demonstrated the effectiveness of data pre-processing and imputation techniques through the development of filling-well method for incomplete dataset handling. Their work highlighted the broader applicability of data cleansing and optimization techniques to improve the quality and accuracy of machine learning models in environmental and geospatial analyses.
The application of AI-powered frameworks has significantly advanced environmental monitoring and air quality prediction in recent years. Basir et al. (2025) proposed an autoencoder artificial neural network model to predict the Air Pollution Index (API) with high accuracy, particularly in data-sparse urban environments. Their approach utilized feature extraction and dimensionality reduction to improve model performance, rendering it suitable for complex atmospheric datasets (Basir et al., 2025; Roslan et al., 2025).
Lakshmi & Krishnamoorthy (2024) further enhanced predictive modelling via a multi-step air quality forecasting system with a bidirectional convolutional long short-term memory (ConvLSTM) encoder-decoder architecture with a spatial-temporal attention (STA) mechanism. Their model demonstrated superior accuracy in predicting PM2.5 and PM10 concentrations across varying temporal horizons compared to traditional machine learning methods.
In a complementary study, Ortiz-Grisales et al. (2025) developed a temperature-sensitive dynamic modelling approach that incorporated the interaction between PM2.5 and negative ions to predict indoor and outdoor air quality. Their results emphasized the necessity of incorporating local microclimatic conditions into modelling frameworks to improve exposure risk predictions and adapt forecasting to real-time environmental variations.
An emerging trend in air quality research is the integration of community-based monitoring with low-cost sensor technologies to address the limitations of traditional regulatory networks. Veres et al. (2025) conducted a comprehensive study in Târgu Mureș in Romania by combining socio-demographic data with sensor-based air quality measurements. Their research revealed notable discrepancies between citizen perceptions of pollution and actual sensor data, thus underlining the importance of participatory approaches in improving environmental awareness and policy formulation.
Caselles Nuñez et al. (2025) contributed to this field by designing and implementing a dual indoor–outdoor air quality measurement device capable of detecting hazardous gases and particulate matters. The system demonstrated over 98% accuracy in measurement compared to reference monitoring stations in Colombia, thus providing a scalable and affordable solution for widespread deployment in urban settings, particularly in low-resource environments.
These advances illustrated the potential of combining participatory monitoring and low-cost sensing technologies to enhance data availability, promote public engagement, and empower communities devoted to environmental governance.
As urban environmental systems become increasingly complex, there has been growing interest in applying multi-criteria decision-making frameworks, such as grey clustering to manage uncertainty and data scarcity in air quality assessment. Karmoude et al. (2025) and Zhang et al. (2025b) analysed the spatiotemporal variations of air quality in the Sichuan-Chongqing region in China between 2016 and 2020. Their study demonstrated how geographic and socio-economic factors influenced pollutant patterns and emphasized the need for spatially adaptive diagnostic tools.
Xu & Luo (2025) provided additional insights into the benefits of spatial planning by evaluating the relationship between urban clusters and efforts of environmental protection in China. They found that cities employing tighter regulatory coordination and integrated development strategies achieved significantly better air quality outcomes, particularly in reducing NOₓ and PM concentrations, hence highlighting the effectiveness of regional governance models.
Recent advances in air quality monitoring have increasingly aligned with the principles of the United Nations Sustainable Development Goals (SDGs), particularly SDG 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities).
Aman et al. (2025) developed a machine learning framework to quantify the effects of emissions and meteorological factors on PM2.5 in Greater Bangkok, thus contributing to SDG 3 through improved early-warning systems and health risk assessments. Similarly, Kirana et al. (2025) designed a spatiotemporal web Geographic Information System (GIS) decision-support platform that enables local governments in Indonesia to respond rapidly to pollution events and optimize urban planning, directly advancing SDG 11.
Complementary approaches such as grey clustering and multivariable diagnostics (Xu & Luo, 2025; Zhang et al., 2025b)—provide adaptive and data-efficient frameworks for sustainable urban management. In addition, community-based sensing initiatives (Veres et al., 2025) and low-cost monitoring systems (Caselles Nuñez et al., 2025) strengthen environmental awareness and participatory governance.
In summary, previous studies using statistical, AI-driven, and participatory approaches have advanced air quality assessment, yet most depended on extensive datasets and were limited in handling uncertainty and data gaps typical of cities in the Global South. Grey system models addressed part of these challenges but were applied primarily to single-pollutant or non-urban contexts. This study built upon these foundations by adapting the grey clustering approach with CTWF to a multi-pollutant and data-scarce urban setting, with an aim to provide a flexible and transferable diagnostic tool that bridges methodological innovation and the needs of sustainable urban planning.
3. Methodology
Urban sustainability planning in data-scarce cities faces major challenges due to limited monitoring infrastructure, regulatory inconsistencies, and high spatial variability in pollution levels. Traditional quantitative models typically require large and consistent datasets, assuming deterministic or probabilistic relationships rarely met in fragmented urban environments. To address these limitations, grey systems theory offers a robust alternative.
Originally introduced by Febrina et al., (2025), grey systems theory is specifically designed to model uncertainty and incomplete information. It has proven to be effective in socio-environmental decision-making process where data are limited (Liu & Lin, 2011). Unlike fuzzy logic or probabilistic models, grey models operate on known and unknown (or “grey”) variables, rendering them particularly suitable for urban systems marked by partial observability and monitoring gaps.
According to grey systems theory, grey clustering enables classification of objects such as air quality observations into predefined grey classes using whitenization functions. This approach is ideal for cities like Lima, where pollutants such as PM2.5 and PM10 interact in non-linear and uncertain ways across the spatial and temporal scales.
CTWF is a technique that enhances the interpretability of grey clustering by assigning triangular membership curves to each pollutant category. CTWF has been successfully applied in domains such as water quality (Delgado et al., 2018), evaluation of social impact (Delgado & Romero, 2018), and assessment of human resources (Li, 2013), but has yet to be extensively used in air quality frameworks. This study adapted CTWF to the data of PM2.5 and PM10, in order to offer a flexible model for classifying urban air quality when data uncertainty and pollutant overlap are prevalent.
Owing to the incorporation of uncertainty proposed by grey systems theory, the air quality assessment index for particulate matter (PM2.5 and PM10) in this study outperformed traditional models, including the Delphi method (Hwang & Kim, 2023; Maghsoudi et al., 2024) and the Analytic Hierarchy Process (AHP) (Charchaoui & El Moudden, 2024; Wang et al., 2024). The CTWF approach, a central component of grey clustering analysis, can also be integrated with other techniques, including the Shannon entropy method (Delgado & Romero, 2018), correlation analysis (Yang & Lin, 2024), and system-based Internet of Things (IoT) (Luo et al., 2024; Morales et al., 2022).
The grey clustering method implemented via incidence matrices or weight functions enables the evaluation of multiple correlated criteria (Liu et al., 2017; Lv & Liu, 2024; Tao et al., 2020). This work adopted the CTWF approach, which is particularly suitable for classifying levels of air quality based on the measurement of PM2.5 and PM10, as shown in Figure 1.

The CTWF approach follows a six-step structured sequence adapted from previous literature (Delgado et al., 2018; Zeng, 2022; Zhao et al., 2015). It is summarized below.
Let:
Objects (m): number of monitoring sites (e.g., urban stations)
Criteria (n): number of pollutants (PM2.5 and PM10)
Grey classes (s): number of air quality classes (e.g., Good, Moderate, Unhealthy)
xij: raw value of pollutant 𝑗 at site 𝑖
Step 1: Definition of grey class
Air quality thresholds were derived from Peruvian environmental quality standards (EQS). Each class midpoint (𝜆) was computed as the mean of the upper and lower bounds of the legal range. For example, PM10 class midpoints were 27.0, 104.5, 204.5, and 304.5 µg/m3 (MINAM, 2017).
Step 2: Data normalization
To ensure dimensionless processing by (1):
where, $\bar{x}_j$ is the mean of class midpoints for pollutant.
Step 3: Membership function using CTWF
Each pollutant class 𝑘 is assigned a triangular function by (2)–(4):
Step 4: Grey weight for each criterion
To avoid bias from unequal pollutant influence, weights $\eta_j^k$ are calculated by (5):
Step 5: Clustering coefficient
The clustering coefficient $\sigma_j^k$ for observation 𝑖 in class 𝑘 is calculated by (6):
Step 6: Classification rule
The air quality class assigned to each site 𝑖 corresponds to the class 𝑘* with the highest clustering coefficient determined by (7):
Although this model was calibrated using Peruvian environmental thresholds, its architecture is entirely modular. The CTWF grey clustering process can be adapted to any region by modifying the grey classes according to local or international air quality regulations (e.g., World Health Organization, EPA, Ministry of Ecology and Environment in China). The index is particularly valuable in cities with sparse sensor networks, inconsistent monitoring, or fragmented regulatory frameworks, as these conditions prevail in many developing countries. Its low data requirement and high interpretability enable it to be a promising diagnostic tool for sustainable environmental governance.
Unlike previous applications of grey clustering in environmental analysis, this study adapted the CTWF formulation for multi-pollutant urban air quality assessment. The method integrates simultaneous PM2.5 and PM10 concentrations, recalibrates class thresholds based on both national (Peruvian) and international (EPA) AQI standards, and introduces weighting adjustments to account for data uncertainty and incomplete monitoring. These adaptations extend the conventional CTWF grey clustering approach, turning it into a robust and transferable diagnostic tool for urban environmental planning under data-scarce conditions.
4. Case Study in Peru’s Lima
Lima, the capital of Peru and a complex urban ecosystem, is home to over 10 million residents. It has undergone rapid and often unplanned urban expansion, leading to land-use conflicts, fragmented transport systems, and severe environmental degradation. In the city, the dense fleet of over 1.6 million vehicles, many old and poorly regulated, contributes significantly to air pollution by PM2.5 and PM10. These conditions are compounded by the geographic position of Lima in a coastal desert basin, where atmospheric inversions trap pollutants during the dry season (MINAM, 2017).
These challenges turn Lima into a critical test case for air quality models. It is one of the most polluted capitals in Latin America (World Health Organization, 2021), and features high socio-environmental inequality: industrial corridors are often adjacent to informal housing, and low-income districts bear the brunt of pollutant exposure. Moreover, the limited public environmental infrastructure and monitoring coverage in Lima highlight its need for data-efficient and adaptive diagnostic tools like the grey clustering method in this study.
Nine air quality monitoring stations were selected to represent the diverse urban conditions in Lima, as they distributed across districts with varied land-use, density, and socioeconomic profiles. Figure 2 illustrates the geographic distribution of the nine monitoring stations across Lima. The map shows the spatial diversity of the selected sites, ranging from peripheral districts such as Carabayllo and Villa María del Triunfo to central areas like San Borja and Campo de Marte. This spatial configuration captures contrasting urban environments from industrial and high-traffic zones to residential and green areas, hence allowing a representative assessment of air quality variability across the city.

Data were obtained from SENAMHI (Servicio Nacional de Meteorología e Hidrología del Perú) and corresponded to daily average concentrations of PM2.5 and PM10 for February 2024, a month characterized by high stagnation and dry conditions (Ministerio del Ambiente-SENAMHI, 2024). The data were quality-checked, and only stations with over 95% completeness were retained. The locations of the monitoring stations are presented in Table 1, and the data collected is presented in Table 2.
Num (m) | Station | Location | Code |
1 | Carabayllo | Peripheral, low-density sprawl | CRB |
2 | San Martín de Porres | Central-north, residential | SMP |
3 | San Juan Lurigancho | High-density, high-traffic | SJL |
4 | Ceres | Eastern growing urban frontiers | CRS |
5 | Pariachi | Eastern growing urban frontiers | PAR |
6 | Santa Anita | Logistics-industrial zone | STA |
7 | Villa Maria del Triunfo | Southern low-income periphery | VMT |
8 | San Borja | Middle-upper class residential area | SBJ |
9 | Campo de Marte | Central urban park | CMD |
Object | CRB | SMP | SJL | CRS | STA | PAR | SBJ | CMD | VMT |
PM2.5 | 28.8 | 9.6 | 18.4 | 28.9 | 15.3 | 27.4 | 15.1 | 11.9 | 15.2 |
PM10 | 41.0 | 20.3 | 39.1 | 58.0 | 25.5 | 54.2 | 50.6 | 24.5 | 33.7 |
Two criteria were established in this study based on the two particulate materials (PM) affecting air quality and being monitored in Peru’s Lima, as presented in Table 3.
Number (n) | Criterion | Description |
1 | PM10 | Particulate material whose diameter is less than 10 microns |
2 | PM2.5 | Particulate material whose diameter is less than 2.5 microns |
The grey classes were defined according to the law from the government of Peru, specifically the environmental quality standards for PM10 and PM2.5 (Ministerio del Ambiente-SENAMHI, 2024), which are presented in Table 4 and Table 5.
Number (s) | Range (μg/m3) | Description |
1 | 0 - 54 | Good |
2 | 55 - 154 | Moderate |
3 | 155 - 254 | Unhealthy for sensitive groups |
4 | 255 - 354 | Unhealthy |
Number (s) | Range (μg/m3) | Description |
1 | 0 - 12 | Good |
2 | 12.1 – 35.4 | Moderate |
3 | 35.5 – 55.4 | Unhealthy for sensitive groups |
4 | 55.5 – 150.4 | Unhealthy |
Step 1: From Table 4 and Table 5, the centre points of the grey classes were determined. The results are presented in Table 6.
Number (s) | PM10 | PM2.5 | Code | Description |
1 | 27.0 | 6.0 | λ1 | Good |
2 | 104.5 | 23.8 | λ2 | Moderate |
3 | 204.5 | 45.4 | λ3 | Unhealthy for sensitive groups |
4 | 304.5 | 103.0 | λ4 | Unhealthy |
Step 2: The monitoring values from Table 2 and the standard values from Table 6 were non-dimensioned using the arithmetic mean. Both were non-dimensioned regarding the standard values. The results are presented in Table 7, Table 8, and Table 9.
Criterion | λ1 | λ2 | λ3 | λ4 | Arithmetic Mean |
PM2.5 | 6.0 | 23.8 | 45.4 | 103.0 | 44.5 |
PM10 | 27.0 | 104.5 | 204.5 | 304.5 | 160.1 |
Criterion | λ1 | λ2 | λ3 | λ4 |
PM2.5 | 0.13 | 0.53 | 1.02 | 2.31 |
PM10 | 0.17 | 0.65 | 1.28 | 1.90 |
Object | CRB | SMP | SJL | CRS | STA | PAR | SBJ | CMD | VMT |
PM2.5 | 0.65 | 0.22 | 0.41 | 0.65 | 0.34 | 0.62 | 0.34 | 0.27 | 0.34 |
PM10 | 0.26 | 0.13 | 0.24 | 0.36 | 0.16 | 0.34 | 0.32 | 0.15 | 0.21 |
Step 3: CTWF were determined according to Figure 1 and (2)–(4). As an example, the results for the first object (CRB) are presented in Figure 3 and (8)–(11).
Then, the values from Table 9 for CRB were replaced into (7)–(10) and the CTWF values of CRB were obtained. The results are presented in Table 10.
Criterion | PM2.5 | PM10 |
λ1 | 0.000 | 0.819 |
λ2 | 0.765 | 0.181 |
λ3 | 0.235 | 0.000 |
λ4 | 0.000 | 0.000 |
Sum | 1.000 | 1.000 |
Step 4: The weight of the criteria was calculated using (5) with the values from Table 8. The results are presented in Table 11.
Step 5: The cluster coefficient was calculated using (5) with the values from Table 10 and Table 11. As an example, the results of first object (CRB) are presented in Table 12.
Criterion | λ1 | λ2 | λ3 | λ4 |
PM2.5 | 0.56 | 0.55 | 0.56 | 0.45 |
PM10 | 0.44 | 0.45 | 0.44 | 0.55 |
Criterion | PM2.5 | PM10 | $\sigma_j^k$ |
λ1 | 0.000 | 0.819 | 0.364 |
λ2 | 0.765 | 0.181 | 0.502 |
λ3 | 0.235 | 0.000 | 0.131 |
λ4 | 0.000 | 0.000 | 0.000 |
Step 6: The max value of the cluster coefficient was calculated using (6) with the values from Table 12. As an example, the result of first object (CRB) was $\sigma_j^{k^*}$ = 0.502. The results for all the objects were obtained according to the procedures applied to first object. The results for all objects are presented in Table 13.
Object | λ1 | λ2 | λ3 | λ4 |
CRB | 0.364 | 0.502 | 0.131 | 0.000 |
SMP | 0.887 | 0.112 | 0.000 | 0.000 |
SJL | 0.541 | 0.456 | 0.000 | 0.000 |
CRS | 0.266 | 0.599 | 0.133 | 0.000 |
STA | 0.710 | 0.287 | 0.000 | 0.000 |
PAR | 0.288 | 0.614 | 0.095 | 0.000 |
SBJ | 0.580 | 0.419 | 0.000 | 0.000 |
CMD | 0.815 | 0.184 | 0.000 | 0.000 |
VMT | 0.675 | 0.323 | 0.000 | 0.000 |
5. Results and Discussion
The levels of air quality for nine districts in Lima were classified based on PM2.5 and PM10 concentrations during February 2024, with the CTWF-based grey clustering approach. These classifications, shown in Figure 3 and Figure 4, reflect both the categorical level and the relative membership strength within each grey class (Delgado et al., 2018).
Figure 3 presents locations categorized under λ1 (Good Air Quality): SMP (San Martín de Porres), CMD (Campo de Marte), STA (Santa Anita), VMT (Villa María del Triunfo), SBJ (San Borja), and SJL (San Juan de Lurigancho). As shown in Figure 3, districts with better air quality are generally concentrated in central and southern Lima, where vegetation cover is higher and industrial activity is lower. The predominance of “Good” classifications in these zones reflects lower emission densities and favourable dispersion conditions, consistent with the city’s spatial environmental gradient. Their membership strengths follow the order:
SMP (0.887) > CMD (0.815) > STA (0.710) > VMT (0.675) > SBJ (0.580) > SJL (0.541)
This means that SMP monitoring point has better air quality than SJL monitoring point, but all monitoring points are of good air quality.
Figure 4 shows zones classified under λ2 (Moderate Air Quality): PAR (Pariachi), CRS (Ceres), and CRB (Carabayllo), in the order:
PAR (0.614) > CRS (0.599) > CRB (0.502)
In addition, Figure 4 highlights that moderate air quality conditions are more prevalent in the eastern and northern peripheries of Lima, where rapid urban expansion, industrial corridors, and limited green coverage contribute to higher PM2.5 and PM10 concentrations. These spatial patterns confirm the influence of land use and emission sources on the distribution of particulate pollution across the metropolitan area.
This means that PAR monitoring point has better air quality than CRB monitoring point, but all monitoring points are in moderate air quality level.
These results offered more nuances than conventional AQIs by revealing how strongly a location fit into a category. For example, while SMP and SJL are both “Good”, SMP’s λ1 coefficient is substantially higher, suggesting cleaner air under comparable classification (Liu & Lin, 2011).


The output of the grey clustering model was benchmarked against two frameworks for reference:
The U.S. EPA AQI uses a single-dominant pollutant approach to determine the categories of daily air quality (EPA, 2024). In several Lima districts, such as San Borja (SBJ) and Villa María del Triunfo (VMT), the EPA AQI classified air quality as “Moderate” on days when the levels of either PM2.5 or PM10 exceeded recommended thresholds. In contrast, the grey clustering model integrated both pollutants, resulting in a “Good” classification when the combined risk was low. This illustrates the multi-pollutant sensitivity of the model and its ability to offer a more nuanced assessment under borderline conditions.
The Peruvian EQS served as legal anchors to define the limits of grey class used in the model (MINAM, 2017). Some stations, such as Parque de las Leyendas (PAR), were classified as “Moderate” though occasionally exceeding PM10 threshold. This reflects the flexibility of the model to accommodate variability and borderline cases, so as to provide a more continuous and adaptive classification compared to rigid binary exceedance approaches.
The spatial distribution of air quality classification across Lima revealed clear patterns consistent with the well-documented environmental inequalities in the city (Mampitiya et al., 2024). Districts such as San Martín de Porres (SMP) and Comas (CMD), which benefit from greater vegetation cover and reduced vehicular congestion, consistently showed superior λ1 values, to indicate conditions of better air quality. In contrast, districts on the eastern and northern peripheries, including Parque de las Leyendas (PAR) and Carabayllo (CRS), reflected the effects of ongoing urban sprawl, industrial encroachment, and insufficient air quality governance.
These findings suggested important opportunities for targeted environmental interventions. Areas exhibiting high λ2 values could be prioritized for emission control strategies, such as restricting heavy freight traffic during peak hours, enhancing urban green infrastructure, or promoting public transport alternatives. A key advantage of the grey clustering index is its ability to provide continuous intra-category differentiation, allowing policymakers to identify and prioritize higher-risk zones even within the same regulatory category.
To evaluate the robustness of the proposed grey clustering based AQI, this study conducted a qualitative sensitivity assessment of the influence of class thresholds and weighting coefficients on classification outcomes.
The results of the model were inherently dependent on the definition of grey class boundaries and the relative weights assigned to PM2.5 and PM10. Small variations in these parameters such as adjusting the midpoint values of the Peruvian environmental quality standards or modifying pollutant weights within ±10%, could shift individual monitoring stations between adjacent categories (e.g., from Good to Moderate). However, the comparative ranking of sites and the overall spatial pattern of air quality across Lima remain stable, indicating that the model is robust to moderate parameter perturbations.
Moreover, the use of CTWF inherently smooths transitions between classes, thus reducing the impact of abrupt changes of thresholds. This characteristic enhances the reliability of the model under uncertain or borderline conditions and supports its transferability to other regions with different regulatory standards. Future quantitative sensitivity testing, incorporating additional pollutants and multi-seasonal data, would further strengthen these findings and confirm the stability of the model under varying inputs.
The strengths and limitations are as follows:
Strengths:
The simultaneous inclusion of PM2.5 and PM10 improves representativeness compared to single-pollutant indices.
Grey clustering handles uncertain or missing data effectively, thus suitable for cities with incomplete or fragmented monitoring networks.
The model architecture can be readily adapted to include additional pollutants or to comply with local regulatory standards with minimal structural modifications.
Limitations
Class boundaries are dependent on legal standards, which may differ from international recommendations such as those established by the World Health Organization.
The current model evaluates daily or period-average values and does not reflect short-term spikes or intra-day fluctuations.
Expanding the model to include co-pollutants such as ozone (O3), nitrogen dioxide (NO2), or sulphur dioxide (SO2) would further enhance its diagnostic capabilities and robustness.
Another important limitation of this study is the temporal scope of the dataset. The analysis is based exclusively on daily PM2.5 and PM10 data collected during February 2024, which represents a single-month snapshot rather than a multi-seasonal record. Consequently, the findings should be interpreted as illustrative rather than conclusive, thus reflecting short-term spatial patterns rather than long-term trends. Future research incorporating longitudinal or multi-seasonal datasets would enable a comprehensive assessment of temporal variability and the robustness of the model.
6. Conclusions
This study proposed and validated an innovative AQI based on grey systems theory, utilizing CTWF to assess urban air quality in data-limited environments. By jointly evaluating PM2.5 and PM10 concentrations through a clustering framework, the model demonstrated robust capacity to classify pollution levels with greater flexibility and nuances than traditional AQIs. Unlike conventional approaches that rely on single-dominant pollutant rules and rigid thresholding, the proposed index enabled a continuous and adaptive assessment of air quality. This characteristic is particularly valuable for cities where pollutant interactions are complex and regulations are limited.
When applied to nine districts of Lima in Peru, the model successfully captured spatial disparities in air pollution levels and provided actionable insights for urban planning and environmental management. The index was able to highlight intra-category differences within the same level of air quality, thus offering a valuable tool targeting interventions in vulnerable and high-risk neighbourhoods. Comparative benchmarking with the U.S. EPA AQI and Peruvian environmental standards confirmed the consistency of the model while highlighting its advantages in handling multiple pollutants and borderline cases.
More importantly, the grey clustering framework is not only accurate but also highly transferable. Its modular design allows easy recalibration to other pollutants, monitoring networks, or regulatory standards, rendering it an effective and scalable diagnostic tool for supporting Sustainable Development Goals (SDGs), particularly SDG 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities).
Future research should focus on expanding the model to include additional pollutants such as nitrogen dioxide (NO2), ozone (O3), sulphur dioxide (SO2), and potentially noise or light pollution. Incorporating temporal dynamics to reflect hourly and seasonal variability would further enhance the accuracy and decision-making capacity of the model. To recapitulate, the proposed index represents a promising and policy relevant contribution to the environmental management toolkit for rapidly urbanizing regions which are facing monitoring gaps and regulatory fragmentation.
The data used to support the research findings are available from the corresponding author upon request
The author declares no conflict of interest.
$f_j^k\left(x_{i j}\right)$ | Membership function of pollutant 𝑗 at site 𝑖 for class 𝑘; dimensionless |
m | Number of monitoring sites; dimensionless |
n | Number of pollutants (e.g., PM2.5, PM10); dimensionless |
s | Number of air quality grey classes; dimensionless |
$x_{i j}$ | Raw value of pollutant 𝑗 at site 𝑖; µg/m³ |
$x_{i j}^{\prime}$ | Normalized value of pollutant 𝑗 at site 𝑖; dimensionless |
$\bar{x}_j$ | Arithmetic means of pollutant thresholds; µg/m³ |
$\eta_j^k$ | Weight of pollutant 𝑗 in class 𝑘; dimensionless |
$\sigma_j^k$ | Clustering coefficient of monitoring site 𝑖 in class 𝑘; dimensionless |
$\sigma_j^{k^*}$ | Maximum clustering coefficient (final classification) at site 𝑖; dimensionless |
Greek symbols | |
λk | Central midpoint value of class 𝑘; µg/m³ |
Subscripts | |
i | Index for monitoring sites |
j | Index for pollutants (e.g., PM2.5, PM10) |
k | Index for grey class (e.g., Good, Moderate) |
