Analysis of Urban Expansion Patterns and Land Use Changes in Cajamarca (Peru): An Integration of GIS, GEE and Predictive Models
Abstract:
Unplanned urban expansion poses significant challenges to sustainable territorial development in intermediate cities. This study analyzes the dynamics of urban expansion and land use change in the city of Cajamarca (Peru) during the period 1986−2040, integrating Geographic Information Systems (GIS) techniques, Google Earth Engine (GEE) and CA-Markov prediction models. Landsat satellite images from 1986, 2004 and 2022, classified by Random Forest (RF), were used to generate thematic maps and evaluate their accuracy. Subsequently, a spatial simulation model was implemented to project urban expansion to 2040. The results indicate an increase in the urban area from 789.68 hectares to 5,768.19 hectares, while forests and crops also changed. The driving factors for this expansion include rural-urban migration, the availability of services, and real estate development. Projections highlight growth toward the east, southeast, and south of the city. This approach provides strategic inputs for sustainable urban planning and effective land management in transforming Andean cities.
1. Introduction
More than 50% of urban growth, or urban expansion, results from the interaction of global economic, social, and environmental processes [1]. Urban growth depends on demographic changes such as immigration, migration from rural areas to the city and population growth [2]. Soil, water, and the environment have been affected by unplanned urban expansion [3], [4], impacting agricultural areas, forests, and sustainable urban development [5], [6], [7], [8].
Therefore, the technical-scientific analysis of these areas is very important to better understand current and future urban growth patterns and trends [2], [9], [10]. Accurate, consistent, and updated information on urbanization trends is important for (i) the formulation of urban development plans and (ii) the definition of urban and rural area boundaries in order to ensure sustainable land use [10]. For this purpose, land use and land cover change (LULC) maps are used to help understand the dynamics and driving sources of future changes that cities must face [11]. These maps are generated using tools such as Geographic Information Systems (GIS) and remote sensing. In addition, various geospatial and statistical models for urban prediction [12].
Urban prediction models utilize machine learning (ML) algorithms, such as Cellular Automata (CA) [13], Markov Chains (MC) [14], CA-MC [15] and CA-logistic regression (LR) [16]. These models incorporate thematic layers such as LULC, elevation, shadow, slope, distance to roads, and rivers, among others [17]. Urban growth predictions have been applied in various countries, including Nepal [10], Bangladesh [18], Germany, India [15] and Brazil [19]. Continents such as the Americas and Europe have experienced urban growth processes in recent years, driven by population and economic development [20], [21]. Asia, in particular, is undergoing accelerated growth, with China exhibiting the highest rate of urban expansion, which threatens environmental quality and socioeconomic sustainability [22]. Meanwhile, in South America, countries face challenges related to urban growth, which exerts pressure on agriculture, water resources, fertilization, and labor [23], [24]. Peru has experienced urban growth in its main cities such as Lima, Callao, Arequipa, Trujillo, Piura and Huancayo [25]. However, no studies have yet been reported on the analysis of land-use expansion and dynamics in high Andean areas of northern Peru using tools such as cloud computing and predictive models, which could help to better understand these changes.
The city of Cajamarca, located in the northern Peruvian Andes, is not only a contemporary urban center but also a site of great historical significance. It played a pivotal role during the Inca period and within the broader context of the Tahuantinsuyo empire. Over the past 30 years, Cajamarca has experienced significant urbanization, driven by the expansion of built-up areas and mining activities [26].
Despite the extensive use of CA-Markov models globally, their application remains limited in intermediate Andean cities of South America. This study represents one of the first attempts to integrate GIS, cloud-based remote sensing, and CA-Markov modeling in Cajamarca. Our approach addresses this gap by offering a replicable method for predicting urban expansion in similar Andean settings, which remain understudied. Therefore, evaluating the urban growth dynamics of this city using GIS for the period 1986−2040 is essential to understanding how a city rooted in the legacy of the Incas is transforming under modern development pressures.
2. Material and Methods
The city of Cajamarca is located in the district and province of Cajamarca, in northern Peru, at an altitude of 2750 m asl in the Andes Mountain range (Figure 1). The city is considered a dynamic center with a high population density of approximately 300,000 inhabitants [27]. It provides public services, including spaces for commercial, tourism, and industrial activities [28].

The methodological flowchart for evaluating the urban expansion dynamics of Cajamarca through GIS for the period 1986–2040 is depicted in Figure 2. It includes the selection of satellite image mosaics for 1986, 2004 and 2022, which were processed through supervised classification in a GIS environment and on the Google Earth Engine (GEE) platform. Subsequently, the thematic accuracy of each generated map was assessed. Finally, the urban area (UA) suitability was evaluated using CA-MC projections for 2040.

The acquisition of spatial information included the official national cartography (hydrography and elevation) and the district road network. It also incorporated the use of a Digital Elevation Model (DEM) with a resolution of 30 meters [29], along with satellite imagery detailed in Table 1: Landsat 5 TM for 1985 and 2004, as well as Landsat 8 OLI for 2022. Only images with less than 30% cloud cover were selected, based on the average annual cloud coverage in the study area.
The satellite images were classified in GEE and projected to 2040 using IDRISI Taiga software. All spatial layers in vector format (points, lines and polygons) were georeferenced to the WGS 1984 Datum and UTM Zone 17 South, corresponding to the Cajamarca region.
Satellite | Year of Analysis | Selected Bands | ID in GEE |
Landsat 8 OLI | 2022 | Blue, Green, Red, NIR, SWIRI and SWIR2 | LANDSAT/ LC08/ CO2/ T1_L2 |
Landsat 5 TM | 1986 and 2004 | LANDSAT/ LT05/ CO2/ T1_L2 |
This activity involved the compiling of thematic studies on land use, urban growth, land-use plans and other studies related to the research that would provide an environmental [30], cultural and socioeconomic context of the city of Cajamarca. The collection of documents made it possible to identify the physical and spatial factors influencing the urban expansion of the Cajamarca [26].
To minimize the influence of atmospheric interference, cloud and cloud-shadow masks were applied using the Quality Assessment (QA) band provided in the Landsat surface reflectance datasets [31]. A combination of the Fmask algorithm and manual thresholding techniques was used to further remove residual cloud effects. Additionally, only scenes with less than 30% cloud coverage were selected, and composite mosaics were generated for each year using the median reflectance to reduce noise. Following this, corrections for clouds and cloud shadows were applied, resulting in high-quality annual mosaics. Blue, Green, Red, NIR, SWIR1 and SWIR2 bands were selected as predictor variables for supervised classification. These calibrated spectral bands were then combined to create multiband images for each year of analysis [32]. Additionally, spectral indices such as Normalized Difference Vegetation Index (NDVI), Modified Soil Adjusted Vegetation Index (MSAVI), Enhanced Vegetation Index (EVI), Green Chlorophyll Vegetation Index (GCVI), Normalized Difference Water Index (NDWI) and Atmospherically Resistant Vegetation Index (ARVI) were calculated. Topographic variables, including DEM, slope and aspect, were also incorporated into the analysis.
Through field trips, training areas were georeferenced for forest (F), shrubland and grassland (SG), pasture and crops (PC), urban area (UA), and water (W) classes using a Global Positioning System (GPS) receiver. The training areas were uploaded to GEE, and then Random Forest (RF) supervised classification was applied to generate a single map for each year of analysis (1986, 2004 and 2022) [33]. To improve the classifications, a spatial filter was applied to remove noise from the classified images. Finally, the raster was exported to perform area calculations, estimate changes by period, and generate cartographic maps.
Average annual rates of change (s) were calculated for each class during the periods 1986−2004 and 2004−2022, using Eq. (1) as proposed by FAO [34]. Where, $S_1$ and $S_2$ represent the area of each class at dates $t_1$ and $t_2$, respectively. A negative value of s indicates a loss in the class, while a positive value of s signifies an increase in the same class.
The future prediction of urban growth was carried out using dynamic models that integrate MC-CA methodologies to model and visualize urban expansion [35]. The CA model was applied to perform future predictions using IDRISI Taiga software. The prediction was based on two types of variables: the classified maps obtained from Landsat data (for two specific periods, 2004 and 2022) and independent variables including: i) distance to roads, ii) distance to rivers, iii) distance to the population center, iv) elevation, and v) slope, which were used to generate the transition potential matrix [36]. Once the potential matrix was obtained, the CA model was applied to simulate the future urban growth map for the year 2040. Additionally, to validate the model’s accuracy by comparing the simulated maps with those obtained (from Landsat data) for the same year in the study area.
The thematic accuracy of the generated maps was determined in three steps: calculating the number of reference areas, sampling design and data processing [37]. The number of reference areas was calculated using Cochran’s Eq. (2) [38] for each class of the generated maps (1986, 2004 and 2022). For each class, randomly distributed points were obtained.
where,
$s$ $\;\;\;\;\;$ level of confidence (95%).
$p, q$ $\;\;\;\;\;$ estimated hits and errors, respectively.
$q=1-p.$
$E $ $\;\;\;\;\;$ error allowed (5%).
Verification of the validation areas was performed both in the field and using high-resolution images [39]. Subsequently, metrics were calculated to evaluate the accuracy of the results [40]. Thus, UA was calculated, corresponding to errors of commission or exclusion (from the user's perspective), while producer accuracy (PA) addressed errors omission or inclusion (according to the map maker) [41]. Additionally, overall accuracy (OA) [42] andthe Kappa coefficient ($k$) were calculated (Eq. (3)).
where,
$r$ $\;\;\;\;\;$ number of rows.
$n$ $\;\;\;\;\;$ total number of verification areas.
$c_{nn}$ $\;\;\;\;\;$ the number of observations in row $n$ and column in $n$.
$c_{n+},c_{+n}$ $\;\;\;\;\;$ total marginal of row $n$ and column $n$, respectively.
3. Results
Urban growth within a city can be attributed to a variety of interacting factors [43], [44]. Specifically, the urban growth factors identified in the city of Cajamarca are depicted in Figure 3, relating to the region and the specific context of the city, as described below:
(a) Demographic growth: Associated with the increase in population driven by demand for housing, employment, services, and urban activities overall.
(b) Migration: Population growth spurred by demand of individuals seeking housing, employment, and various urban opportunities.
(c) Economic development: Employment opportunities and the development of economic sectors, such as industry, commerce, and services, attract people to the city in search of work and improved living conditions.
(d) Infrastructure and services: The availability of robust infrastructure, including roads, public transportation, health services, education (both public and private universities), and other amenities, contributes to the city’s appeal as a place to live.
(e) Real estate investment and development: Investments in residential and commercial real estate projects drive urban growth by creating new housing options and business spaces.
(f) Education: The presence of quality educational institutions attracts students and professionals, generating significant growth in the urban population.

The implementation of satellite image preprocessing and classification in GEE included image correction, calculation of spectral indices, and derivation of topographic variables. The creation of predictor variables such as spectral bands (Blue, Green, Red, NIR, SWIR1, SWIR2), spectral indices (NDVI, MSAVI, EVI, GCVI, NDWI, ARVI), and topographic indices (DEM, slope and aspect) allowed for the identification of B, AH, PC, U and classes in the study area.
The percentages of area by class and by each year of analysis are detailed in Table 2. The forest exhibited an increase from 1986 to 2022, expanding from 531.42 to 2,549.58 ha, respectively. The CPs also showed a tendency to increase in area, reaching 1,486.32 ha in the years studied. UA increased significantly from 789.68 ha to 3,849.21 has between 1986 and 2022. Conversely, the SG class decreased from 19,523.40 ha to 12,970.98 has over the same period.
LULC | 1986 | % | 2004 | % | 2022 | % |
Forest (F) | 531.01 | 2.22 | 1,309.40 | 5.48 | 2,549.58 | 10.66 |
Shrub and grassland (SG) | 19,523.35 | 81.66 | 17,653.10 | 73.83 | 12,970.98 | 54.25 |
Pasture and crops (PC) | 3,035.02 | 12.69 | 3,379.10 | 14.13 | 4,521.41 | 18.91 |
Urban area (UA) | 798.68 | 3.34 | 1,550.46 | 6.48 | 3,849.21 | 16.10 |
Water (W) | 21.23 | 0.09 | 17.23 | 0.07 | 18.11 | 0.08 |
Total | 23,909.29 | 100.00 | 23,909.29 | 100.00 | 23,909.29 | 100.00 |
The spatial distribution of LULC for 1986, 2004 and 2022 is depicted in Figure 4. It is observed that the forest class (F) is mainly located in the center and north of the area, with a notable expansion towards the east and west by 2022.
The SG class shows a widely dispersed distribution throughout the region during the three periods. As for CPs, their presence is concentrated in the center of the area and has increased toward the north, east and southeast. The UA exhibits sustained growth from the central core of the city, expanding towards the northeast, east and southeast, indicating a pattern of peripheral urbanization. On the other hand, the W bodies show fluctuations in their extent over time. Map validation metrics (Kappa index, UA, PA, and OA) exceeded 83%, ensuring high reliability in the classification results.
The changes in land cover and land use classes during two periods of analysis are presented in Figure 5 and Table 3. The first period (P1) covers the years 1986-2004, while the second period (P2) covers 2004 to 2022. A decrease in the change towards natural cover (9.07 to 5.92%), in change toward anthropic use (15.25 to 8.24%), and in the permanence of anthropic use (20.12 to 12.52%) is observed in P2. In contrast, the permanence of natural cover showed a significant increase, from 55.55% in P1 to 73.32% in P2.


Description | P1 | P2 | ||
1986−2004 | % | 2004−2022 | % | |
Change to natural coverage | 2,168.86 | 9.07 | 1,416.18 | 5.92 |
Change to anthropic use | 3,646.78 | 15.25 | 1,969.83 | 8.24 |
Permanence of natural coverage | 13,282.12 | 55.55 | 17,529.17 | 73.32 |
Permanence of anthropic use | 4,811.53 | 20.12 | 2,994.11 | 12.52 |
Total | 23,909.29 | 100.00 | 23,909.29 | 100.00 |
The spatial variables depicted in Figure 6 were processed and incorporated as predictors in the Land Change Modeler module of IDRISI (https://clarklabs.org/terrset/idrisi-gis/, accessed May 13, 2025) with the objective of predicting urban growth to 2040. These variables included the DEM, distance to urban towns, distance to rivers, distance to roads, and terrain slope.

The Land Change Modeler module predicted land cover and land use classes, with results indicating a decrease in the shrub and grassland classes, but an increase in the forest, urban area, pasture and crop classes. Moreover, the model provided predictions for changes in 2040, using the 2004 and 2022 maps. The maps generated reported an accuracy of more than 75%. The historical changes in the urban area for Cajamarca are estimated at 5,768.19 ha, with expansion towards the western, southern and northeastern areas of the city (Figure 7).


Surface area loss for each class was also analyzed. In P1, P2, and P3 (2022−2040), the class that lost the most area was SG, followed by PC. Regarding the intensity of loss at the class level, W and F experienced significant reductions in P1, while in P2, W, SG, and F were most affected. In P3, the main losses continued to be in W, SG, and F.
The shrubland and grassland class exhibited changes, primarily showing a decrease in surface area. These changes were directed toward the forest, urban, and pasture and crops classes across all years of analysis. As illustrated in Figure 8, the urban class demonstrated a consistent increase throughout the years, with the forest, grassland and shrubland, and pasture and crop classes contributing the most to its expansion.
4. Discussion
LULC change is related to population growth and the development of anthropogenic activities. The evaluation of the dynamics of urban expansion in Cajamarca using GIS techniques over the 54 years of analysis allowed us to identify important changes. The increase in UA, PC and F is evident from the reduction of SG. Urban growth in Cajamarca is substantial [27], creating a vast and fragmented territory similar to other Peruvian cities such as Lima, Callao, Arequipa, Trujillo, Piura, and Huancayo [25]. This often contributes to environmental and social deterioration [45]. Despite this, urban areas represent only 2% of the world's surface while housing more than 55% of the population living in cities [46]. Therefore, it is necessary to plan the development of cities sustainably that will reduce environmental impacts and improve the inhabitants' quality of life.
Multiple physical, socioeconomic and environmental factors have been identified as promoting urban growth in Cajamarca. These include population growth, migration, availability of basic services such as electricity, water, sewage, education, health, and investments that cause people to migrate from rural areas to urban areas. These factors often improve people's living conditions; however, in other cases, they can negatively impact the environment (e.g., solid waste dumps, environmental pollution, loss of biodiversity, conversion of forest and crops to urban areas) [47], [48]. Therefore, it is important to implement adequate urban planning in order to manage urban development and human activities within the physical environment [49], [50], addressing land use policy alternatives, complying with regulations, and coordinating with stakeholders at different administrative decision-making levels [50].
The study identified five LULC classes: forest, grassland and shrubland, pasture and crops, urban areas, and water. In percentage terms, the shrubland and grassland class will represent the largest area (41.72%), followed by urban area (24.13%), pasture and crops (19.76%), forest (14.24%) and water (0.16%) of the study area by 2040. Likewise, at the period level of analysis, there is a decrease in the SG class and an increase in the forest, pasture and crops and urban area classes. The increase in the forest class is likely related to reforestation projects similar to Granja Porcón, while the increase in the pasture and crops class is closely related to the installation of new agricultural and pasture plots [51], [52], [53].
Urban growth in the city of Cajamarca shows an upward trend from 798.68 ha (3.34%) to 5,768.19 ha (24.13%) from 1986 to 2040, respectively. Urban growth patterns are consistent with Canelo & Moscoso [54], who demonstrate that the city’s southward expansion has been primarily driven by the pressure of the informal land market, influenced by mining investment and limited action by authorities.
The application of tools such as GIS and GEE allowed the detection of LULC changes and the prediction of land use [55], [56], [57]. The possible urban expansion areas in Cajamarca will be located to the northeast, southeast and south of the city, which could be conditioned with the creation of new human settlements and private housing developments promoted by real estate agencies influencing urban growth [58]. However, the GEE platform may have some limitations when working on small areas, such as resolution, which does not allow a detailed analysis of land cover. Additionally, data collected by satellites may contain more noise and errors, potentially affecting the accuracy of analyses; the availability of historical data and the presence of clouds may also limit the development of investigations [59], [60]. Finally, this study reports on the possible urban expansion zones in Cajamarca, which can be considered as an input for institutions involved in planning and urban development framed within sustainable development and environmental conservation.
5. Conclusions
This study evaluated the dynamics of urban growth in the city of Cajamarca using GIS techniques and cloud-based processing over the period 1986−2040. Five LULC classes were identified F, SG, PC, UA, and W, with results showing increases in F, PC, and UA, and a reduction in SG. These changes are associated with multiple physical, socioeconomic, and political-institutional factors, including population growth, migration, and the availability of basic services.
The CA-Markov model effectively predicted urban expansion patterns, indicating that growth will continue toward the northeast, southeast, and tsouth of the city by 2040. These projections provide valuable inputs for sustainable urban planning and territorial management in transforming Andean areas.
Future research should integrate additional social and economic variables (e.g., income levels, education, housing policies), and explore alternative machine learning-based predictive models (e.g., ANN, SVM) to enhance forecasting accuracy and adaptability to different urban contexts.
Conceptualization, E.B., J.D.C.-C., R.E.G.-S., E.A.D.O., E.C.-C., and A.C.-S.; methodology, E.B., J.D.C.-C., R.E.G.-S., E.A.D.O., E.C.-C., and A.C.-S.; software, E.B. and J.D.C.-C.; validation, E.A.D.O., E.C.-C., and A.C.-S; formal analysis, E.B. and J.D.C.-C.; investigation, E.B., J.D.C.-C., R.E.G.-S., E.A.D.O., E.C.-C., and A.C.-S.; resources, E.B., J.D.C.-C., and E.A.D.O.; data curation, R.E.G.-S. and A.C.-S.; writing—original draft preparation, E.B. and J.D.C.-C.; writing—review and editing, R.E.G.-S., E.A.D.O., E.C.-C., and A.C.-S.; visualization, R.E.G.-S., E.A.D.O., E.C.-C., and A.C.-S.; supervision, E.B.; project administration, E.B.; funding acquisition, E.B. and E.A.D.O. All authors have read and agreed to the published version of the manuscript.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
