Grey Clustering Based Air Quality Index to Detect Urban Air Quality in Lima

alexi delgado

Outline

Open Access

Research article

Grey Clustering Based Air Quality Index to Detect Urban Air Quality in Lima

Alexi Delgado^*

Environmental Engineering Program, Universidad de Ciencias y Humanidades, 15304 Lima, Peru

Challenges in Sustainability

|

Volume 13, Issue 4, 2025

|

Pages 546-559

https://doi.org/10.56578/cis130406

Received: 08-02-2025,

Revised: 10-07-2025,

Accepted: 10-27-2025,

Available online: 11-12-2025

View Full Article|

Download PDF

Abstract:

Urban air pollution remains a persistent challenge in the Global South, where rapid urbanization, limited monitoring infrastructure, and weak regulatory frameworks hinder effective environmental governance. In Lima, Peru—one of the most polluted capitals in Latin America—elevated PM_2.5 and PM₁₀ concentrations continue to pose serious threats to public health and sustainable urban development. Traditional Air Quality Index (AQIs), such as the U.S. EPA standard, often struggle to account for data uncertainty, pollutant interactions, and spatial heterogeneity. To address these gaps, this study introduces a novel AQI based on grey systems theory, applying a grey clustering framework enhanced with center-point triangular whitenization weight functions (CTWF). The model was specifically designed to handle ambiguous data and overlapping pollution categories. It was applied to daily PM_2.5 and PM₁₀ data from nine monitoring stations across metropolitan Lima, with validation conducted against both Peru’s national air quality standards and the U.S. EPA AQI. Results showed that the proposed index outperformed conventional methods under uncertain conditions, revealing critical spatial disparities often missed by traditional models. Beyond diagnostic accuracy, the index offers a scalable and transferable tool for urban planners and decision-makers to support targeted interventions, inform policy development, and advance Sustainable Development Goals—specifically SDG 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities).

Keywords: Urban sustainability, Air quality index, Grey clustering, Environmental planning, Particulate matter, PM_2.5, PM₁₀

1. Introduction

Urban air pollution is one of the most pressing environmental and public health challenges of the 21st century. Among pollutants, particulate matter (PM_2.5 and PM₁₀) has been consistently linked to respiratory and cardiovascular diseases, reduced life expectancy, and increased mortality rates (Kebe et al., 2025; World Health Organization, 2021). These risks are particularly acute in cities of the Global South, where rapid urbanization and limited regulatory capacity exacerbate exposure to hazardous air quality.

Recent research has introduced diverse approaches to improve the assessment of urban air quality, including statistical forecasting in India (Yadav & Ganguly, 2025), artificial neural network model for prediction of air pollution index (Basir et al., 2025), hybrid neural networks in Iraq (Altahaan & Dobslaw, 2025), and integration of socio-demographic patterns with sensor-based measurements (Veres et al., 2025). While these methods provide valuable insights, they depend on dense monitoring networks as well as large and continuous datasets, rarely present in urban contexts with scarce resources. As a result, their applicability in cities with fragmented infrastructures, such as Lima, remains limited.

Conventional Air Quality Indices (AQIs), such as those of the United States Environmental Protection Agency (EPA, 2024) and the Ministry of Ecology and Environment in China (Qin et al., 2024), rely on rigid thresholds and dominance rules that prioritize single pollutants. These frameworks are effective for public communication but insufficient for urban planning as they cannot capture the interaction of pollutants, manage data gaps, and reflect spatial heterogeneity. These gaps underscore the need for more adaptive and uncertainty-resilient diagnostic tools.

Grey systems theory, specifically grey clustering with center-point triangular whitenization weight functions (CTWF), provides a robust framework to address these limitations. Unlike AI or purely statistical models, grey clustering does not require large datasets and can explicitly incorporate incomplete or limited information (Delgado et al., 2018; Delgado & Romero, 2017; Li, 2013). It is therefore particularly suitable for data-scarce cities, where pollutants interact in complex ways and monitoring systems are inconsistent.

Lima in Peru exemplifies the challenges of a megacity with severe pollution from PM_2.5 and PM₁₀, fragmented monitoring network, and high socio-environmental inequality. To address these conditions, this study developed and applied a novel AQI based on grey clustering with CTWF, using daily data from nine monitoring stations (Ministerio del Ambiente-SENAMHI, 2024). The results were validated against both Peruvian environmental standards (MINAM, 2017) and the U.S. EPA AQI (EPA, 2024), thus proving the capacity of the model to generate nuanced classifications and actionable insights. More broadly, the proposed index contributed to Sustainable Development Goal (SDG) 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities) by offering a scalable decision-support tool for urban environmental governance.

2. Literature Review

Urban air quality has become a central theme in Environmental Science due to its critical impact on public health and sustainable city planning. Conventional QIs, while widely used, are often criticized for their rigid classification rules and limited capability to integrate multi-pollutant dynamics or capture uncertainty under data-sparse conditions. In response, recent research has explored a variety of statistical, AI-driven, and grey system approaches to enhance pollutant assessment and decision-making in complex urban environments.

2.1 Statistical and Data-Driven Models

Statistical models continue to play a vital role in air quality assessment, particularly in urban regions with expanding but incomplete monitoring infrastructures. Zhang et al. (2025a) introduced an innovative real-time monitoring system utilizing drone-mounted mass spectrometry to achieve high spatiotemporal-resolution mapping of air pollutants in complex urban environments. Their system provided enhanced capabilities for detecting localized emission sources and transient pollution events, thus contributing significantly to urban air quality surveillance (Granella et al., 2024; Kazemi et al., 2025).

Istiana et al. (2023) further explored data-driven methodologies by conducting a causality analysis between air quality parameters and meteorological conditions in Jakarta. Their findings confirmed strong correlations between PM_2.5 concentrations and meteorological variables such as humidity and wind speed, thus emphasizing the potential for incorporating climate data into predictive pollution models to improve their reliability and responsiveness under dynamic environmental conditions (Istiana et al., 2023; Tume-Bruce et al., 2022).

Though applying in a different domain, Garini et al. (2025) demonstrated the effectiveness of data pre-processing and imputation techniques through the development of filling-well method for incomplete dataset handling. Their work highlighted the broader applicability of data cleansing and optimization techniques to improve the quality and accuracy of machine learning models in environmental and geospatial analyses.

2.2 AI and Machine Learning Approaches

The application of AI-powered frameworks has significantly advanced environmental monitoring and air quality prediction in recent years. Basir et al. (2025) proposed an autoencoder artificial neural network model to predict the Air Pollution Index (API) with high accuracy, particularly in data-sparse urban environments. Their approach utilized feature extraction and dimensionality reduction to improve model performance, rendering it suitable for complex atmospheric datasets (Basir et al., 2025; Roslan et al., 2025).

Lakshmi & Krishnamoorthy (2024) further enhanced predictive modelling via a multi-step air quality forecasting system with a bidirectional convolutional long short-term memory (ConvLSTM) encoder-decoder architecture with a spatial-temporal attention (STA) mechanism. Their model demonstrated superior accuracy in predicting PM_2.5 and PM₁₀ concentrations across varying temporal horizons compared to traditional machine learning methods.

In a complementary study, Ortiz-Grisales et al. (2025) developed a temperature-sensitive dynamic modelling approach that incorporated the interaction between PM_2.5 and negative ions to predict indoor and outdoor air quality. Their results emphasized the necessity of incorporating local microclimatic conditions into modelling frameworks to improve exposure risk predictions and adapt forecasting to real-time environmental variations.

2.3 Participatory and Low-Cost Sensing Innovations

An emerging trend in air quality research is the integration of community-based monitoring with low-cost sensor technologies to address the limitations of traditional regulatory networks. Veres et al. (2025) conducted a comprehensive study in Târgu Mureș in Romania by combining socio-demographic data with sensor-based air quality measurements. Their research revealed notable discrepancies between citizen perceptions of pollution and actual sensor data, thus underlining the importance of participatory approaches in improving environmental awareness and policy formulation.

Caselles Nuñez et al. (2025) contributed to this field by designing and implementing a dual indoor–outdoor air quality measurement device capable of detecting hazardous gases and particulate matters. The system demonstrated over 98% accuracy in measurement compared to reference monitoring stations in Colombia, thus providing a scalable and affordable solution for widespread deployment in urban settings, particularly in low-resource environments.

These advances illustrated the potential of combining participatory monitoring and low-cost sensing technologies to enhance data availability, promote public engagement, and empower communities devoted to environmental governance.

2.4 Grey Clustering and Multivariable Diagnostics

As urban environmental systems become increasingly complex, there has been growing interest in applying multi-criteria decision-making frameworks, such as grey clustering to manage uncertainty and data scarcity in air quality assessment. Karmoude et al. (2025) and Zhang et al. (2025b) analysed the spatiotemporal variations of air quality in the Sichuan-Chongqing region in China between 2016 and 2020. Their study demonstrated how geographic and socio-economic factors influenced pollutant patterns and emphasized the need for spatially adaptive diagnostic tools.

Xu & Luo (2025) provided additional insights into the benefits of spatial planning by evaluating the relationship between urban clusters and efforts of environmental protection in China. They found that cities employing tighter regulatory coordination and integrated development strategies achieved significantly better air quality outcomes, particularly in reducing NOₓ and PM concentrations, hence highlighting the effectiveness of regional governance models.

2.5 Linkages to Sustainable Development Goals

Recent advances in air quality monitoring have increasingly aligned with the principles of the United Nations Sustainable Development Goals (SDGs), particularly SDG 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities).

Aman et al. (2025) developed a machine learning framework to quantify the effects of emissions and meteorological factors on PM_2.5 in Greater Bangkok, thus contributing to SDG 3 through improved early-warning systems and health risk assessments. Similarly, Kirana et al. (2025) designed a spatiotemporal web Geographic Information System (GIS) decision-support platform that enables local governments in Indonesia to respond rapidly to pollution events and optimize urban planning, directly advancing SDG 11.

Complementary approaches such as grey clustering and multivariable diagnostics (Xu & Luo, 2025; Zhang et al., 2025b)—provide adaptive and data-efficient frameworks for sustainable urban management. In addition, community-based sensing initiatives (Veres et al., 2025) and low-cost monitoring systems (Caselles Nuñez et al., 2025) strengthen environmental awareness and participatory governance.

In summary, previous studies using statistical, AI-driven, and participatory approaches have advanced air quality assessment, yet most depended on extensive datasets and were limited in handling uncertainty and data gaps typical of cities in the Global South. Grey system models addressed part of these challenges but were applied primarily to single-pollutant or non-urban contexts. This study built upon these foundations by adapting the grey clustering approach with CTWF to a multi-pollutant and data-scarce urban setting, with an aim to provide a flexible and transferable diagnostic tool that bridges methodological innovation and the needs of sustainable urban planning.

3. Methodology

3.1 Grey Systems Theory and Its Relevance to Urban Environmental Planning

Urban sustainability planning in data-scarce cities faces major challenges due to limited monitoring infrastructure, regulatory inconsistencies, and high spatial variability in pollution levels. Traditional quantitative models typically require large and consistent datasets, assuming deterministic or probabilistic relationships rarely met in fragmented urban environments. To address these limitations, grey systems theory offers a robust alternative.

Originally introduced by Febrina et al., (2025), grey systems theory is specifically designed to model uncertainty and incomplete information. It has proven to be effective in socio-environmental decision-making process where data are limited (Liu & Lin, 2011). Unlike fuzzy logic or probabilistic models, grey models operate on known and unknown (or “grey”) variables, rendering them particularly suitable for urban systems marked by partial observability and monitoring gaps.

According to grey systems theory, grey clustering enables classification of objects such as air quality observations into predefined grey classes using whitenization functions. This approach is ideal for cities like Lima, where pollutants such as PM_2.5 and PM₁₀ interact in non-linear and uncertain ways across the spatial and temporal scales.

CTWF is a technique that enhances the interpretability of grey clustering by assigning triangular membership curves to each pollutant category. CTWF has been successfully applied in domains such as water quality (Delgado et al., 2018), evaluation of social impact (Delgado & Romero, 2018), and assessment of human resources (Li, 2013), but has yet to be extensively used in air quality frameworks. This study adapted CTWF to the data of PM_2.5and PM₁₀, in order to offer a flexible model for classifying urban air quality when data uncertainty and pollutant overlap are prevalent.

3.2 Mathematical Formulation of AQI

Owing to the incorporation of uncertainty proposed by grey systems theory, the air quality assessment index for particulate matter (PM_2.5 and PM₁₀) in this study outperformed traditional models, including the Delphi method (Hwang & Kim, 2023; Maghsoudi et al., 2024) and the Analytic Hierarchy Process (AHP) (Charchaoui & El Moudden, 2024; Wang et al., 2024). The CTWF approach, a central component of grey clustering analysis, can also be integrated with other techniques, including the Shannon entropy method (Delgado & Romero, 2018), correlation analysis (Yang & Lin, 2024), and system-based Internet of Things (IoT) (Luo et al., 2024; Morales et al., 2022).

The grey clustering method implemented via incidence matrices or weight functions enables the evaluation of multiple correlated criteria (Liu et al., 2017; Lv & Liu, 2024; Tao et al., 2020). This work adopted the CTWF approach, which is particularly suitable for classifying levels of air quality based on the measurement of PM_2.5 and PM₁₀, as shown in Figure 1.

Figure 1. Grey clustering-based AQI

The CTWF approach follows a six-step structured sequence adapted from previous literature (Delgado et al., 2018; Zeng, 2022; Zhao et al., 2015). It is summarized below.

Let:

Objects (m): number of monitoring sites (e.g., urban stations)

Criteria (n): number of pollutants (PM_2.5 and PM₁₀)

Grey classes (s): number of air quality classes (e.g., Good, Moderate, Unhealthy)

x_ij: raw value of pollutant 𝑗 at site 𝑖

Step 1: Definition of grey class

Air quality thresholds were derived from Peruvian environmental quality standards (EQS). Each class midpoint (𝜆) was computed as the mean of the upper and lower bounds of the legal range. For example, PM₁₀ class midpoints were 27.0, 104.5, 204.5, and 304.5 µg/m³ (MINAM, 2017).

Step 2: Data normalization

To ensure dimensionless processing by (1):

$x_{i j}^{\prime}=\frac{x_{i j}}{\bar{x}_j}$

(1)

where, $\bar{x}_j$ is the mean of class midpoints for pollutant.

Step 3: Membership function using CTWF

Each pollutant class 𝑘 is assigned a triangular function by (2)–(4):

$f_j^1\left(x_{i j}\right)= \begin{cases}0, & x \notin\left[0, \lambda_2\right] \\ 1, & x \in\left[0, \lambda_1\right) \\ \frac{\lambda_2-x}{\lambda_2-\lambda_1}, & x \in\left[\lambda_1, \lambda_2\right]\end{cases}$

(2)

$f_j^k\left(x_{i j}\right)=\left\{\begin{array}{cc}0, & x \notin\left[\lambda_{k-1}, \lambda_{k+1}\right] \\ \frac{x-\lambda_{k-1}}{\lambda_k-\lambda_{k-1}}, & x \in\left[\lambda_{k-1}, \lambda_k\right) \\ \frac{\lambda_{k+1}-x}{\lambda_{k+1}-\lambda_k}, & x \in\left[\lambda_k, \lambda_{k+1}\right]\end{array}\right.$

(3)

$f_j^s\left(x_{i j}\right)=\left\{\begin{array}{cc}0, & x \notin\left[\lambda_{\mathrm{s}-1},+\infty\right) \\ \frac{x-\lambda_{s-1}}{\lambda_s-\lambda_{s-1}}, & x \in\left[\lambda_{\mathrm{s}-1}, \lambda_s\right) \\ 1, & x \in\left[\lambda_{\mathrm{s}},+\infty\right)\end{array}\right.$

(4)

Step 4: Grey weight for each criterion

To avoid bias from unequal pollutant influence, weights $\eta_j^k$ are calculated by (5):

$\eta_j^k=\frac{\left(\frac{1}{\lambda_j^k}\right)}{\sum_{j=1}^m\left(\frac{1}{\lambda_j^k}\right)}$

(5)

Step 5: Clustering coefficient

The clustering coefficient $\sigma_j^k$ for observation 𝑖 in class 𝑘 is calculated by (6):

$\sigma_j^k=\sum_{j=1}^m\left[f_j^k\left(x_{i j}\right) * \eta_j^k\right]$

(6)

Step 6: Classification rule

The air quality class assigned to each site 𝑖 corresponds to the class 𝑘* with the highest clustering coefficient determined by (7):

$\sigma_j^{k^*}=\max _{1 \leq k \ll s}\left\{\sigma_j^k\right\}$

(7)

3.3 Transferability and Broader Application

Although this model was calibrated using Peruvian environmental thresholds, its architecture is entirely modular. The CTWF grey clustering process can be adapted to any region by modifying the grey classes according to local or international air quality regulations (e.g., World Health Organization, EPA, Ministry of Ecology and Environment in China). The index is particularly valuable in cities with sparse sensor networks, inconsistent monitoring, or fragmented regulatory frameworks, as these conditions prevail in many developing countries. Its low data requirement and high interpretability enable it to be a promising diagnostic tool for sustainable environmental governance.

Unlike previous applications of grey clustering in environmental analysis, this study adapted the CTWF formulation for multi-pollutant urban air quality assessment. The method integrates simultaneous PM_2.5 and PM₁₀ concentrations, recalibrates class thresholds based on both national (Peruvian) and international (EPA) AQI standards, and introduces weighting adjustments to account for data uncertainty and incomplete monitoring. These adaptations extend the conventional CTWF grey clustering approach, turning it into a robust and transferable diagnostic tool for urban environmental planning under data-scarce conditions.

4. Case Study in Peru’s Lima

4.1 Urban Context and Justifications

Lima, the capital of Peru and a complex urban ecosystem, is home to over 10 million residents. It has undergone rapid and often unplanned urban expansion, leading to land-use conflicts, fragmented transport systems, and severe environmental degradation. In the city, the dense fleet of over 1.6 million vehicles, many old and poorly regulated, contributes significantly to air pollution by PM_2.5 and PM₁₀. These conditions are compounded by the geographic position of Lima in a coastal desert basin, where atmospheric inversions trap pollutants during the dry season (MINAM, 2017).

These challenges turn Lima into a critical test case for air quality models. It is one of the most polluted capitals in Latin America (World Health Organization, 2021), and features high socio-environmental inequality: industrial corridors are often adjacent to informal housing, and low-income districts bear the brunt of pollutant exposure. Moreover, the limited public environmental infrastructure and monitoring coverage in Lima highlight its need for data-efficient and adaptive diagnostic tools like the grey clustering method in this study.

4.2 Monitoring Network and Data Processing

Nine air quality monitoring stations were selected to represent the diverse urban conditions in Lima, as they distributed across districts with varied land-use, density, and socioeconomic profiles. Figure 2 illustrates the geographic distribution of the nine monitoring stations across Lima. The map shows the spatial diversity of the selected sites, ranging from peripheral districts such as Carabayllo and Villa María del Triunfo to central areas like San Borja and Campo de Marte. This spatial configuration captures contrasting urban environments from industrial and high-traffic zones to residential and green areas, hence allowing a representative assessment of air quality variability across the city.

Figure 2. Monitoring points in Lima, Peru (Ministerio del Ambiente-SENAMHI, 2024)

4.3 Definition of Study Objects ($m$)

Data were obtained from SENAMHI (Servicio Nacional de Meteorología e Hidrología del Perú) and corresponded to daily average concentrations of PM_2.5 and PM₁₀ for February 2024, a month characterized by high stagnation and dry conditions (Ministerio del Ambiente-SENAMHI, 2024). The data were quality-checked, and only stations with over 95% completeness were retained. The locations of the monitoring stations are presented in Table 1, and the data collected is presented in Table 2.

Table 1. Study objects in the case study

Num (m)	Station	Location	Code
1	Carabayllo	Peripheral, low-density sprawl	CRB
2	San Martín de Porres	Central-north, residential	SMP
3	San Juan Lurigancho	High-density, high-traffic	SJL
4	Ceres	Eastern growing urban frontiers	CRS
5	Pariachi	Eastern growing urban frontiers	PAR
6	Santa Anita	Logistics-industrial zone	STA
7	Villa Maria del Triunfo	Southern low-income periphery	VMT
8	San Borja	Middle-upper class residential area	SBJ
9	Campo de Marte	Central urban park	CMD

Table 2. Data from the case study (μg/m³)

Object	CRB	SMP	SJL	CRS	STA	PAR	SBJ	CMD	VMT
PM_2.5	28.8	9.6	18.4	28.9	15.3	27.4	15.1	11.9	15.2
PM₁₀	41.0	20.3	39.1	58.0	25.5	54.2	50.6	24.5	33.7

4.4 Definition of Criteria ($n$)

Two criteria were established in this study based on the two particulate materials (PM) affecting air quality and being monitored in Peru’s Lima, as presented in Table 3.

Table 3. Criteria for the case study

Number (n)	Criterion	Description
1	PM₁₀	Particulate material whose diameter is less than 10 microns
2	PM_2.5	Particulate material whose diameter is less than 2.5 microns

4.5 Definition of Grey Classes ($s$)

The grey classes were defined according to the law from the government of Peru, specifically the environmental quality standards for PM₁₀ and PM_2.5 (Ministerio del Ambiente-SENAMHI, 2024), which are presented in Table 4 and Table 5.

Table 4. Grey classes for PM₁₀

Number (s)	Range (μg/m³)	Description
1	0 - 54	Good
2	55 - 154	Moderate
3	155 - 254	Unhealthy for sensitive groups
4	255 - 354	Unhealthy

Table 5. Grey classes for PM_2.5

Number (s)	Range (μg/m³)	Description
1	0 - 12	Good
2	12.1 – 35.4	Moderate
3	35.5 – 55.4	Unhealthy for sensitive groups
4	55.5 – 150.4	Unhealthy

4.6 Calculations Using the Steps of the AQI

Step 1: From Table 4 and Table 5, the centre points of the grey classes were determined. The results are presented in Table 6.

Table 6. Centre points of the grey classes

Number (s)	PM₁₀	PM_2.5	Code	Description
1	27.0	6.0	λ₁	Good
2	104.5	23.8	λ₂	Moderate
3	204.5	45.4	λ₃	Unhealthy for sensitive groups
4	304.5	103.0	λ₄	Unhealthy

Step 2: The monitoring values from Table 2 and the standard values from Table 6 were non-dimensioned using the arithmetic mean. Both were non-dimensioned regarding the standard values. The results are presented in Table 7, Table 8, and Table 9.

Table 7. Arithmetic means for standard values

Criterion	λ₁	λ₂	λ₃	λ₄	Arithmetic Mean
PM_2.5	6.0	23.8	45.4	103.0	44.5
PM₁₀	27.0	104.5	204.5	304.5	160.1

Table 8. Non-dimensioned values for standard values

Criterion	λ₁	λ₂	λ₃	λ₄
PM_2.5	0.13	0.53	1.02	2.31
PM₁₀	0.17	0.65	1.28	1.90

Table 9. Non-dimensioned values for monitoring values

Object	CRB	SMP	SJL	CRS	STA	PAR	SBJ	CMD	VMT
PM_2.5	0.65	0.22	0.41	0.65	0.34	0.62	0.34	0.27	0.34
PM₁₀	0.26	0.13	0.24	0.36	0.16	0.34	0.32	0.15	0.21

Step 3: CTWF were determined according to Figure 1 and (2)–(4). As an example, the results for the first object (CRB) are presented in Section 5 and (8)–(11).

Then, the values from Table 9 for CRB were replaced into (7)–(10) and the CTWF values of CRB were obtained. The results are presented in Table 10.

$f_j^1\left(x_{i j}\right)=\left\{\begin{array}{cc}0, & x \notin\left[0, \lambda_2\right] \\ 1, & x \in\left[0, \lambda_1\right) \\ \frac{\lambda_2-x}{\lambda_2-\lambda_1}, & x \in\left[\lambda_1, \lambda_2\right]\end{array}\right.$

(8)

$f_j^2\left(x_{i j}\right)=\left\{\begin{array}{lc}0, & x \notin\left[\lambda_1, \lambda_3\right] \\ \frac{x-\lambda_1}{\lambda_2-\lambda_1}, & x \in\left[\lambda_1, \lambda_2\right) \\ \frac{\lambda_3-x}{\lambda_3-\lambda_2}, & x \in\left[\lambda_2, \lambda_3\right]\end{array}\right.$

(9)

$f_j^3\left(x_{i j}\right)=\left\{\begin{array}{cc}0, & x \notin\left[\lambda_2, \lambda_4\right] \\ \frac{x-\lambda_2}{\lambda_3-\lambda_2}, & x \in\left[\lambda_2, \lambda_3\right) \\ \frac{\lambda_4-x}{\lambda_4-\lambda_3}, & x \in\left[\lambda_3, \lambda_4\right]\end{array}\right.$

(10)

$f_j^4\left(x_{i j}\right)=\left\{\begin{array}{cc}0, & x \notin\left[\lambda_3,+\infty\right) \\ \frac{x-\lambda_3}{\lambda_4-\lambda_3}, & x \in\left[\lambda_3, \lambda_4\right) \\ 1, & x \in\left[\lambda_4,+\infty\right)\end{array}\right.$

(11)

Table 10. CTWF values of CRB

Criterion	PM_2.5	PM₁₀
λ₁	0.000	0.819
λ₂	0.765	0.181
λ₃	0.235	0.000
λ₄	0.000	0.000
Sum	1.000	1.000

Step 4: The weight of the criteria was calculated using (5) with the values from Table 8. The results are presented in Table 11.

Step 5: The cluster coefficient was calculated using (5) with the values from Table 10 and Table 11. As an example, the results of first object (CRB) are presented in Table 12.

Table 11. Values of the weight of the criteria

Criterion	λ₁	λ₂	λ₃	λ₄
PM_2.5	0.56	0.55	0.56	0.45
PM₁₀	0.44	0.45	0.44	0.55

Table 12. Cluster coefficient for first criterion (CRB)

Criterion	PM_2.5	PM₁₀	$\sigma_j^k$
λ₁	0.000	0.819	0.364
λ₂	0.765	0.181	0.502
λ₃	0.235	0.000	0.131
λ₄	0.000	0.000	0.000

Step 6: The max value of the cluster coefficient was calculated using (6) with the values from Table 12. As an example, the result of first object (CRB) was $\sigma_j^{k^*}$ = 0.502. The results for all the objects were obtained according to the procedures applied to first object. The results for all objects are presented in Table 13.

Table 13. Values of the weight of the criteria

Object	λ₁	λ₂	λ₃	λ₄
CRB	0.364	0.502	0.131	0.000
SMP	0.887	0.112	0.000	0.000
SJL	0.541	0.456	0.000	0.000
CRS	0.266	0.599	0.133	0.000
STA	0.710	0.287	0.000	0.000
PAR	0.288	0.614	0.095	0.000
SBJ	0.580	0.419	0.000	0.000
CMD	0.815	0.184	0.000	0.000
VMT	0.675	0.323	0.000	0.000

5. Results and Discussion

5.1 Performance of the Index in Air Quality Classification

The levels of air quality for nine districts in Lima were classified based on PM_2.5 and PM₁₀ concentrations during February 2024, with the CTWF-based grey clustering approach. These classifications, shown in Figure 3 and Figure 4, reflect both the categorical level and the relative membership strength within each grey class (Delgado et al., 2018).

Figure 3 presents locations categorized under λ₁ (Good Air Quality): SMP (San Martín de Porres), CMD (Campo de Marte), STA (Santa Anita), VMT (Villa María del Triunfo), SBJ (San Borja), and SJL (San Juan de Lurigancho). As shown in Figure 3, districts with better air quality are generally concentrated in central and southern Lima, where vegetation cover is higher and industrial activity is lower. The predominance of “Good” classifications in these zones reflects lower emission densities and favourable dispersion conditions, consistent with the city’s spatial environmental gradient. Their membership strengths follow the order:

SMP (0.887) > CMD (0.815) > STA (0.710) > VMT (0.675) > SBJ (0.580) > SJL (0.541)

This means that SMP monitoring point has better air quality than SJL monitoring point, but all monitoring points are of good air quality.

Figure 4 shows zones classified under λ₂ (Moderate Air Quality): PAR (Pariachi), CRS (Ceres), and CRB (Carabayllo), in the order:

PAR (0.614) > CRS (0.599) > CRB (0.502)

In addition, Figure 4 highlights that moderate air quality conditions are more prevalent in the eastern and northern peripheries of Lima, where rapid urban expansion, industrial corridors, and limited green coverage contribute to higher PM_2.5 and PM₁₀ concentrations. These spatial patterns confirm the influence of land use and emission sources on the distribution of particulate pollution across the metropolitan area.

This means that PAR monitoring point has better air quality than CRB monitoring point, but all monitoring points are in moderate air quality level.

These results offered more nuances than conventional AQIs by revealing how strongly a location fit into a category. For example, while SMP and SJL are both “Good”, SMP’s λ₁ coefficient is substantially higher, suggesting cleaner air under comparable classification (Liu & Lin, 2011).

Figure 3. Monitoring points with good air quality

Figure 4. Monitoring points with a moderate level of air quality

5.2 Comparative Benchmarking: EPA AQI and Peruvian EQS

The output of the grey clustering model was benchmarked against two frameworks for reference:

5.2.1 The U.S. EPA AQI

The U.S. EPA AQI uses a single-dominant pollutant approach to determine the categories of daily air quality (EPA, 2024). In several Lima districts, such as San Borja (SBJ) and Villa María del Triunfo (VMT), the EPA AQI classified air quality as “Moderate” on days when the levels of either PM_2.5 or PM₁₀ exceeded recommended thresholds. In contrast, the grey clustering model integrated both pollutants, resulting in a “Good” classification when the combined risk was low. This illustrates the multi-pollutant sensitivity of the model and its ability to offer a more nuanced assessment under borderline conditions.

5.2.2 Peruvian environmental quality standards (EQS)

The Peruvian EQS served as legal anchors to define the limits of grey class used in the model (MINAM, 2017). Some stations, such as Parque de las Leyendas (PAR), were classified as “Moderate” though occasionally exceeding PM₁₀ threshold. This reflects the flexibility of the model to accommodate variability and borderline cases, so as to provide a more continuous and adaptive classification compared to rigid binary exceedance approaches.

5.3 Insights into Urban Planning

The spatial distribution of air quality classification across Lima revealed clear patterns consistent with the well-documented environmental inequalities in the city (Mampitiya et al., 2024). Districts such as San Martín de Porres (SMP) and Comas (CMD), which benefit from greater vegetation cover and reduced vehicular congestion, consistently showed superior λ₁ values, to indicate conditions of better air quality. In contrast, districts on the eastern and northern peripheries, including Parque de las Leyendas (PAR) and Carabayllo (CRS), reflected the effects of ongoing urban sprawl, industrial encroachment, and insufficient air quality governance.

These findings suggested important opportunities for targeted environmental interventions. Areas exhibiting high λ₂ values could be prioritized for emission control strategies, such as restricting heavy freight traffic during peak hours, enhancing urban green infrastructure, or promoting public transport alternatives. A key advantage of the grey clustering index is its ability to provide continuous intra-category differentiation, allowing policymakers to identify and prioritize higher-risk zones even within the same regulatory category.

5.4 Sensitivity and Robustness Analysis

To evaluate the robustness of the proposed grey clustering based AQI, this study conducted a qualitative sensitivity assessment of the influence of class thresholds and weighting coefficients on classification outcomes.

The results of the model were inherently dependent on the definition of grey class boundaries and the relative weights assigned to PM_2.5 and PM₁₀. Small variations in these parameters such as adjusting the midpoint values of the Peruvian environmental quality standards or modifying pollutant weights within ±10%, could shift individual monitoring stations between adjacent categories (e.g., from Good to Moderate). However, the comparative ranking of sites and the overall spatial pattern of air quality across Lima remain stable, indicating that the model is robust to moderate parameter perturbations.

Moreover, the use of CTWF inherently smooths transitions between classes, thus reducing the impact of abrupt changes of thresholds. This characteristic enhances the reliability of the model under uncertain or borderline conditions and supports its transferability to other regions with different regulatory standards. Future quantitative sensitivity testing, incorporating additional pollutants and multi-seasonal data, would further strengthen these findings and confirm the stability of the model under varying inputs.

5.5 Strengths and Limitations

The strengths and limitations are as follows:

Strengths:

The simultaneous inclusion of PM_2.5 and PM₁₀ improves representativeness compared to single-pollutant indices.
Grey clustering handles uncertain or missing data effectively, thus suitable for cities with incomplete or fragmented monitoring networks.
The model architecture can be readily adapted to include additional pollutants or to comply with local regulatory standards with minimal structural modifications.

Limitations:

Class boundaries are dependent on legal standards, which may differ from international recommendations such as those established by the World Health Organization.
The current model evaluates daily or period-average values and does not reflect short-term spikes or intra-day fluctuations.
Expanding the model to include co-pollutants such as ozone (O₃), nitrogen dioxide (NO₂), or sulphur dioxide (SO₂) would further enhance its diagnostic capabilities and robustness.
Another important limitation of this study is the temporal scope of the dataset. The analysis is based exclusively on daily PM_2.5 and PM₁₀ data collected during February 2024, which represents a single-month snapshot rather than a multi-seasonal record. Consequently, the findings should be interpreted as illustrative rather than conclusive, thus reflecting short-term spatial patterns rather than long-term trends. Future research incorporating longitudinal or multi-seasonal datasets would enable a comprehensive assessment of temporal variability and the robustness of the model.

6. Conclusions

This study proposed and validated an innovative AQI based on grey systems theory, utilizing CTWF to assess urban air quality in data-limited environments. By jointly evaluating PM_2.5 and PM₁₀ concentrations through a clustering framework, the model demonstrated robust capacity to classify pollution levels with greater flexibility and nuances than traditional AQIs. Unlike conventional approaches that rely on single-dominant pollutant rules and rigid thresholding, the proposed index enabled a continuous and adaptive assessment of air quality. This characteristic is particularly valuable for cities where pollutant interactions are complex and regulations are limited.

When applied to nine districts of Lima in Peru, the model successfully captured spatial disparities in air pollution levels and provided actionable insights for urban planning and environmental management. The index was able to highlight intra-category differences within the same level of air quality, thus offering a valuable tool targeting interventions in vulnerable and high-risk neighbourhoods. Comparative benchmarking with the U.S. EPA AQI and Peruvian environmental standards confirmed the consistency of the model while highlighting its advantages in handling multiple pollutants and borderline cases.

More importantly, the grey clustering framework is not only accurate but also highly transferable. Its modular design allows easy recalibration to other pollutants, monitoring networks, or regulatory standards, rendering it an effective and scalable diagnostic tool for supporting Sustainable Development Goals (SDGs), particularly SDG 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities).

Future research should focus on expanding the model to include additional pollutants such as nitrogen dioxide (NO₂), ozone (O₃), sulphur dioxide (SO₂), and potentially noise or light pollution. Incorporating temporal dynamics to reflect hourly and seasonal variability would further enhance the accuracy and decision-making capacity of the model. To recapitulate, the proposed index represents a promising and policy relevant contribution to the environmental management toolkit for rapidly urbanizing regions which are facing monitoring gaps and regulatory fragmentation.

Data Availability

The data used to support the research findings are available from the corresponding author upon request

Conflicts of Interest

The author declares no conflict of interest.

References

Altahaan, Z. & Dobslaw, D. (2025). Post-war air quality index in Mosul City, Iraq: Does war still have an impact on air quality today? Atmosphere, 16(2), 135. [Google Scholar] [Crossref]

Aman, N., Panyametheekul, S., Pawarmart, I., Xian, D., Gao, L., Tian, L., Manomaiphiboon, K., & Wang, Y. (2025). Machine learning-based quantification and separation of emissions and meteorological effects on PM_2.5 in Greater Bangkok. Sci. Rep., 15, 14775. [Google Scholar] [Crossref]

Basir, N. I., Tan, K. K., Djarum, D. H., Ahmad, Z., Dai-Viet Vo, N., & Jie, Z. (2025). Autoencoder artificial neural network model for air pollution index prediction. IIUM Eng. J., 26(1), 49–62. [Google Scholar] [Crossref]

Caselles Nuñez, J. G., Contreras Negrette, O. A., de Jesús Beleño Sáenz, K., & Díaz Sáenz, C. G. (2025). Design and implementation of an indoor and outdoor air quality measurement device for the detection and monitoring of gases with hazardous health effects. Eng. Proc., 83(1), 13. [Google Scholar] [Crossref]

Charchaoui, M. & El Moudden, A. (2024). Optimal site selection for Solar PV Farms in Northern Morocco using the AHP approach and GIS tools. In 2024 International Conference on Circuit, Systems and Communication (ICCSC), Fes, Morocco. [Google Scholar] [Crossref]

Delgado, A., Aguirre, A., Palomino, E., & Salazar, G. (2018). Applying triangular whitenization weight functions to assess water quality of main affluents of Rimac river. In 2017 Electronic Congress (E-CON UNI), Lima, Peru. [Google Scholar] [Crossref]

Delgado, A. & Romero, I. (2017). Applying grey systems and Shannon entropy to social impact assessment and environmental conflict analysis. Int. J. Appl. Eng. Res., 12(24), 14327–14337. [Google Scholar]

Delgado, A. & Romero, I. (2018). Environmental conflict analysis on a hydrocarbon exploration project using the Shannon entropy. In 2017 Electronic Congress (E-CON UNI), Lima, Peru. [Google Scholar] [Crossref]

EPA. (2024). Technical Assistance Document for the Reporting of Daily Air Quality—the Air Quality Index (AQI). https://www.epa.gov/air-trends [Google Scholar]

Febrina, S., Aimon, H., Kurniadi, A. P., & Marta, J. (2025). Assessing the role of the blue economy in strengthening food security: Evidence from lower-middle-income ASEAN countries. Challe. Sustain., 13(1), 110–121. [Google Scholar] [Crossref]

Garini, S. A., Shiddiqi, A. M., Utama, W., & Insani, A. N. F. (2025). Filling-well: An effective technique to handle incomplete well-log data for lithology classification using machine learning algorithms. MethodsX, 14, 103127. [Google Scholar] [Crossref]

Granella, F., Renna, S., & Aleluia Reis, L. (2024). The formation of secondary inorganic aerosols: A data-driven investigation of Lombardy’s secondary inorganic aerosol problem. Atmos. Environ., 327, 120480. [Google Scholar] [Crossref]

Hwang, S. & Kim, T. (2023). An exploratory study on artifacts for cyber attack attribution considering false flag: Using Delphi and AHP methods. IEEE Access, 11, 74533–74544. [Google Scholar] [Crossref]

Istiana, T., Kurniawan, B., Soekirno, S., Nahas, A., Wihono, A., Nuryanto, D. E., Adi, S. P., & Hakim, M. L. (2023). Causality analysis of air quality and meteorological parameters for PM2.5 characteristics determination: Evidence from Jakarta. Aerosol Air Qual. Res., 23(9), 230014. [Google Scholar] [Crossref]

Karmoude, M., Munhungewarwa, B., Chiraira, I., Mckenzie, R., Kong, J., Smith, B., Ayana, G., Njara, N., Mathaha, T., Kumar, M., & Mellado, B. (2025). Machine learning for air quality prediction and data analysis: Review on recent advancements, challenges, and outlooks. In Science of The Total Environment. Elsevier. [Google Scholar] [Crossref]

Kazemi, K., Vernet, A., & Fabregat, A. (2025). Using open data to derive parsimonious data-driven models for uncovering the influence of local traffic and meteorology on air quality: The case of Madrid. Environ. Pollut., 383, 126691. [Google Scholar] [Crossref]

Kebe, M., Traore, A., Sow, M., Fall, S., & Tahri, M. (2025). Human health risk evaluation of particle air pollution (PM10 and PM2.5) and heavy metals in Dakar’s two urban areas. Asian J. Atmos. Environ., 19, 7. [Google Scholar] [Crossref]

Kirana, A. P., Saleh, W. A. R., Sabilla, W. I., Vista, C. B., Wakhidah, R., & Wijayaningrum, V. N. (2025). Spatio-temporal analysis and real-time air quality monitoring using historical data and laravel: A decision tree-based web GIS system. E3S Web Conf., 611, 01002. [Google Scholar] [Crossref]

Lakshmi, S. & Krishnamoorthy, A. (2024). Effective multi-step PM2.5 and PM10 air quality forecasting using bidirectional ConvLSTM encoder-decoder with STA mechanism. IEEE Access, 12, 179628–179647. [Google Scholar] [Crossref]

Li, Q. (2013). Human resources assessment based on standard triangular whitenization weight function. In Proceedings of 2013 IEEE International Conference on Grey systems and Intelligent Services (GSIS), Macao, China. [Google Scholar] [Crossref]

Liu, S., Lin, C., & Yang, Y. (2017). Several problems need to be studied in grey system theory. In 2017 International Conference on Grey Systems and Intelligent Services (GSIS), Stockholm, Sweden. [Google Scholar] [Crossref]

Liu, S. & Lin, Y. (2011). Grey Systems: Theory and Applications. Springer Berlin Heidelberg. [Google Scholar] [Crossref]

Luo, H., Wang, J., Lin, D., Kong, L., Zhao, Y., & Guan, Y. L. (2024). A novel energy-efficient approach based on clustering using gray prediction in WSNs for IoT infrastructures. IEEE Internet Things J., 11(14), 24748–24760. [Google Scholar] [Crossref]

Lv, J. & Liu, X. (2024). Equipment reliability analysis based on grey clustering. In 2024 Asia-Pacific Conference on Software Engineering, Social Network Analysis and Intelligent Computing (SSAIC), New Delhi, India. [Google Scholar] [Crossref]

Maghsoudi, M., Mohammadi, A., & Habibipour, S. (2024). Navigating and addressing public concerns in AI: Insights from Social media analytics and Delphi. IEEE Access, 12, 126043–126062. [Google Scholar] [Crossref]

Mampitiya, L., Rathnayake, N., Hoshino, Y., & Rathnayake, U. (2024). Performance of machine learning models to forecast PM10 levels. MethodsX, 12, 102557. [Google Scholar] [Crossref]

MINAM. (2017). Estándares de Calidad Ambiental (ECA) para Aire y establecen Disposiciones Complementarias. https://www.minam.gob.pe/wp-content/uploads/2017/04/Proyecto-de-DS-ECA-AIRE.pdf [Google Scholar]

Ministerio del Ambiente-SENAMHI. (2024). Vigilancia de calidad del aire, Área metropolitana de Lima y Callao, Febrero 2024. https://www.senamhi.gob.pe/load/file/03201SENA-131.pdf [Google Scholar]

Morales, S. A. H., Andrade-Arenas, L., Delgado, A., & Huamani, E. L. (2022). ugmented reality: Prototype for the teaching-learning process in Peru. Int. J. Adv. Comput. Sci. Appl., 13, 806–815. [Google Scholar] [Crossref]

Ortiz-Grisales, P. M., Gutiérrez-León, L., Duque-Grisales, E., & Zuluaga-Ríos, C. D. (2025). Dynamic modeling under temperature variations for sustainable air quality solutions: PM2.5 and negative ion interactions. Sustainability, 17(1), 70. [Google Scholar] [Crossref]

Qin, J., Liu, Z., Sui, W., Peng, T., & Zhao, S. (2024). Efficient group-aware graph neural network for air quality forecasting in small-scale spaces. In 2024 International Applied Computational Electromagnetics Society Symposium (ACES-China), Xi’an, China. [Google Scholar] [Crossref]

Roslan, S. N. M., Gohain, K., Mustafa, A. M. A. A., Ismail, M. M., & Kumaran, V. V. (2025). Designing affordable urban ecosystems: A quantitative model to enhance the quality of life for the urban poor in Malaysia through employment, housing, and digital access. Chall. Sustain., 13(1), 18–34. [Google Scholar] [Crossref]

Tao, J., Fang, Z., & Wang, X. (2020). Modeling the health state of complex systems with component dependency based on grey clustering. In 2020 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), Beijing, China. [Google Scholar] [Crossref]

Tume-Bruce, B. A. A., Delgado, A., & Huamaní, E. L. (2022). Implementation of a web system for the improvement in sales and in the application of digital marketing in the company Selcom. Int. J. Recent Innov. Trends Comput. Commun., 10(5), 48–59. [Google Scholar] [Crossref]

Veres, C., Bacos, I., Tănase, M., & Gabor, M. R. (2025). Analyzing urban air quality perceptions: Integrating socio-demographic patterns with sensor-based measurements using regression model and multidimensional scaling. Sustainability, 17(2), 580. [Google Scholar] [Crossref]

Wang, X., Wang, J., Ning, X., Tian, T., Sun, Z., & Hu, X. (2024). A method for evaluating the wildfire risk level of distribution lines based on AHP and EWM. In 2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway. [Google Scholar] [Crossref]

World Health Organization. (2021). WHO global air quality guidelines: particulate matter (‎PM2.5 and PM10)‎, ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. https://www.who.int/publications/i/item/9789240034228 [Google Scholar]

Xu, Z. & Luo, J. (2025). Sustainable urbanization: Unpacking the link between urban clusters and environmental protection. Sustainability, 17(3), 873. [Google Scholar] [Crossref]

Yadav, V. & Ganguly, R. (2025). Evaluation and spatial mapping of criteria air pollutants in an industrial city in India. J. Hazard. Toxic Radioact. Waste, 29(3). [Google Scholar] [Crossref]

Yang, Y. & Lin, G. (2024). Grey clustering correlation analysis of AC contactor closing dynamic performance using panel data. In 2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT), Jilin, China. [Google Scholar] [Crossref]

Zeng, S. (2022). Application of grey clustering algorithm in wet-land ecotourism development. In 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India. [Google Scholar] [Crossref]

Zhang, J., Zhou, Z., Huang, Q., Liu, X., Wang, B., & Hu, B. (2025a). Real-time visual monitoring and high spatiotemporal-resolution mapping of air pollutants using a drone-mass spectrometer system. Environ. Sci. Technol., 59(16), 8099–8107. [Google Scholar] [Crossref]

Zhang, Z., Dai, X., Xie, Z., Yu, C., Liao, Z., Li, J., Jiang, F., Liu, Y., Liu, Z., Zhang, Q., & Li, W. (2025b). Spatiotemporal variation characteristics and influencing factors of air quality in Sichuan-Chongqing region, China, 2016–2020. Environ. Earth Sci., 84. [Google Scholar] [Crossref]

Zhao, H. H., Jian, L. R., & Liu, Y. (2015). A novel grey clustering group decision-making model and application. In 2015 IEEE International Conference on Grey Systems and Intelligent Services (GSIS), Leicester, UK. [Google Scholar] [Crossref]

Nomenclature

$f_j^k\left(x_{i j}\right)$	Membership function of pollutant 𝑗 at site 𝑖 for class 𝑘; dimensionless
m	Number of monitoring sites; dimensionless
n	Number of pollutants (e.g., PM_2.5, PM₁₀); dimensionless
s	Number of air quality grey classes; dimensionless
$x_{i j}$	Raw value of pollutant 𝑗 at site 𝑖; µg/m³
$x_{i j}^{\prime}$	Normalized value of pollutant 𝑗 at site 𝑖; dimensionless
$\bar{x}_j$	Arithmetic means of pollutant thresholds; µg/m³
$\eta_j^k$	Weight of pollutant 𝑗 in class 𝑘; dimensionless
$\sigma_j^k$	Clustering coefficient of monitoring site 𝑖 in class 𝑘; dimensionless
$\sigma_j^{k^*}$	Maximum clustering coefficient (final classification) at site 𝑖; dimensionless
Greek symbols
λ_k	Central midpoint value of class 𝑘; µg/m³
Subscripts
i	Index for monitoring sites
j	Index for pollutants (e.g., PM_2.5, PM₁₀)
k	Index for grey class (e.g., Good, Moderate)

Cite this:

APA Style

IEEE Style

BibTex Style

MLA Style

Chicago Style

GB-T-7714-2015

Delgado, A. (2025). Grey Clustering Based Air Quality Index to Detect Urban Air Quality in Lima. Chall. Sustain., 13(4), 546-559. https://doi.org/10.56578/cis130406

cc

©2025 by the author(s). Published by Acadlore Publishing Services Limited, Hong Kong. This article is available for free download and can be reused and cited, provided that the original published version is credited, under the CC BY 4.0 license.

pdf

Figure 1. Grey clustering-based AQI

Table 1. Study objects in the case study

Citations

Crossref: 0