Comprehensive Exploration of Next-Day Electricity Price Forecasting Using CNN-GRU-VAE Hybrid Model
Abstract:
The unpredictable nature of energy markets makes precise electricity price forecasting (EPF) necessary to improve bidding strategies and lower risk. For instance, this study introduces a hybrid deep learning model CNN-GRU-VAE, that learns sequences using Gated Recurrent Units (GRU), finds features using Convolutional Neural Networks (CNN), and becomes more general using a Variational Autoencoder (VAE). In tests that looked ahead one day, the CNN-GRU-VAE performed better than the CNN, ANN, GRU, and CNN-GRU models. The model’s Root Mean Squared Error (RMSE) is 0.8733, Mean Squared Error (MSE) is 0.7627, and Mean Absolute Error (MAE) is 0.6373. These findings demonstrate improved accuracy and stability across diverse market conditions. The integration of convolutional, recurrent, and generative components within a unified framework provides superior predictions compared to traditional methods, demonstrating robustness and practical applicability for day-ahead electricity price forecasting in competitive energy markets.
1. Introduction
The modern power system, which includes generation, transmission, distribution, and consumption, is a key part of the world’s economic growth. Electricity costs are an essential economic indicator in this system, and they affect the actions of everyone in the market. The switch from centralized, government-controlled electricity supply systems to free and competitive markets in the early 1990s changed the energy sector worldwide in a big way. Spot pricing and financial derivatives are used in markets to trade electricity.
Electricity is distinguished from other marketable commodities by its non-storable characteristic, necessitating a continuous balance between supply and demand. Preserving this equilibrium is essential for the reliability and stability of the entire power grid [1], [2]. Electricity consumption and pricing are influenced by multiple factors, including meteorological variables such as temperature, precipitation, and wind speed, alongside socioeconomic and behavioral elements, including time-of-day usage patterns, holidays, and weekday weekend discrepancies [3], [4].
The diverse variables lead to unique electricity pricing attributes, including daily and seasonal trends and unforeseen surges. The resultant price volatility influences resource allocation and directly affects the interaction between power producers and customers [5], [6]. Stabilizing this market behavior is crucial for assuring grid resilience and market efficiency [7]. Nonetheless, the erratic and nonlinear characteristics of power pricing render precise forecasting very difficult.
Establishing the forecasting horizon is a key step in improving electricity price forecasting (EPF). When predicting electricity prices, there are usually three main timeframes: short-term (from hours to a few weeks), medium-term (from weeks to months), and long-term (more than a year) [8], [9]. Each horizon serves a different operational purpose. The short-term EPF is highly helpful for making strategic decisions like strategies for real-time bidding and responding to demand. Medium-term forecasts help with planning maintenance, planning expansions, and organizing fuel use. Long-term forecasts are important for deciding how to invest in infrastructure and what energy policy to follow.
This research examines short-term EPF utilizing statistics from two separate electricity markets. The remainder of this paper is organized as follows: Section 2 presents a literature review covering previous methods, followed by the research gap that motivates this study in Section 2.2. Section 2.3 details the specific contributions of this work. Section 3 describes the research methodology, data processing, and the proposed Convolutional Neural Networks-Gated Recurrent Units-Variational Autoencoder (CNN-GRU-VAE) model. Section 4 presents the results and discussion, comparing the proposed model's performance against other works. Finally, Section 5 concludes the paper with the main findings, limitations, and future work.
2. Literature Review
This section shows the recent research on EPF, which includes statistical, machine learning, and deep learning models, as well as the results of each method. The literature indicates numerous efforts have been suggested to handle the EPF problem [10]. As summarized in Table 1, the studies discussed in this section are complied to provide a concise overview of the inputs, methods, datasets, and time horizon of each approach. For example, statistical techniques are often used to forecast electricity prices. Conventional methods such as the Autoregressive Integrated Moving Average (ARMA) [11] and Autoregressive Moving Average with Exogenous inputs (ARMAX) [12] models—have been applied to EPF and peak-load forecasting, respectively. A Hilbert operator-based ARMAX variation was proposed in the study [13] to assess moving average (MA) terms in real time; it was evaluated on German and Spanish power markets and showed better accuracy than other models.
Further, enhanced empirical mode decomposition paired with ARMAX and Adaptive Neuro-Fuzzy Inference Systems (ANFIS) was proposed in the study [14] for one-day-ahead forecasts using Spanish and Australian data, demonstrating greater predictive precision. The work in the study [15] introduced a hybrid ARIMA-based method for the Iberian market, reducing hourly forecast errors. Although these statistical methods can forecast electricity prices, they primarily rely on linear relationships and have limited capacity to capture complex nonlinearities.
Machine learning (ML) models were developed to overcome the shortcomings of statistical approaches. Support Vector Machines (SVM) were used to project German electricity prices based on historical electricity and gas data [16]. An Extreme Learning Machine (ELM) model presented in the study [17] achieved shorter computation times and more minor residual errors than previous research. Extreme Gradient Boosting (XGBoost) applied to Ontario data produced an MSE of 15.66 and an MAE of 3.74% [18]. However, single ML models can struggle with highly nonlinear, high-dimensional data.
Hybrid ML methods such as Relevance Vector Machines (RVMs) merged with linear regression for New England data [19] and comprehensive ELM variants applied to Australian and Ontario datasets [10] have demonstrated improved accuracy. A stacked Extra Tree Regression (ETR) with Automatic Relevance Determination (ARD) achieved MAE, MSE, and RMSE of 2.03, 3.09, and 16.7 (£/MWh), respectively [20]. ML models have also been applied in related domains, such as photovoltaic power prediction [21], [22] and cardiovascular disease detection [23], [24]. Despite these improvements, single machine learning models still face challenges with highly nonlinear, high-dimensional data, prompting researchers to explore hybrid approaches and deep learning (DL) architectures.
DL approaches overcome ML constraints by modeling intricate nonlinear interactions. DL has been applied to tasks like fault identification [25], [26] and torque estimation [27]. For EPF, Artificial Neural Networks (ANN) were utilized for short-term forecasting [28]; the N-BEATS algorithm produced RMSEs of 6.78 and 4.46 for multivariate and univariate forecasts of Ontario data [29]. Adam-optimized Long Short-Term Memory (LSTM) models have reduced errors compared to conventional LSTMs [30]. Convolutional Neural Networks (CNN) have been used for short-term EPF [31], and hybrid DL architectures—such as CNN-LSTM [32], Wavelet-LSTM [33], and WT-SAE-LSTM [34]—have further enhanced forecasting accuracy. Other hybrid approaches—including ANN with Artificial Cooperative Search (ACS) [35], ANFIS-based two-stage models [36], and the SEPNet model combining CNN, Variational Mode Decomposition (VMD), and GRU [37] have achieved notable performance gains. Recent studies have explored attention-based CNN models [38] and transparent DL models with attention to spot pricing [39], emphasizing accuracy and interpretability.
Reference | Inputs | Method | Datasets | Time Horizon |
[40] | 1, 24, 168 lagged EP values | ACBFS-VMD-BOHB-LSTM | PJM Regulation Zone | Short-term (day ahead) |
[39] | 168 hourly lagged price observations | ANN-ACS | Ontario electricity market | Short-term |
[22] | Historical data | ARD-ETR | Nord Pool market | Day ahead |
[41] | Historical price, bid load, temperature | ATTnet | New York City electricity data | Short-term (1.5 h ahead) |
Proposed | 168 hourly lagged price observations | CNN-GRU-VAE | UK & German electricity markets [42], [43] | One-day ahead |
[44] | Price of power for the past twenty-four hours | CNN-LSTM | Preliminary Billing Data for PJM Zone | Short-term (1 h ahead) |
[45] | 24, 168, 720 lagged EP values | CNN-LSTM | German spot price data | Short-term (day, week, month ahead) |
[46] | Different inputs via feature selection | CNN-LSTM Encoder-Decoder | – | Short-term (day ahead) |
[44] | 168 hourly lagged price observations | CNN-Self attention | Ontario electricity market | Short-term (day ahead) |
[47] | 24 h past prices | LSTM, CNN-LSTM | Iranian electricity market | – |
[36] | 168 hourly lagged price observations | MOBBSA-ANFIS | Ontario electricity market | Short-term |
[31] | Load demand, wind speed, and electricity price | N-BEATS | Ontario market | Short-term (1 h ahead) |
[21] | Historic price observations | RVMs-LR | New England electricity market | Short-term |
[34] | 168 hourly lagged price observations | SEPNet | New York City electricity data | Short-term (1 h ahead) |
CNN-LSTM hybrids work well, but they are still sensitive to changes in input size and unexpected price changes, which might make it harder to predict what will happen when the market is volatile. Moreover, existing forecasting models lack adequate mechanisms to quantify prediction uncertainty, which is critical for risk management in electricity trading. There hasn’t been much research on combining generative models, like Variational Autoencoders (VAE), with CNN-GRU architectures to address both accuracy and uncertainty quantification. This work fills these gaps by suggesting and testing a new CNN-GRU-VAE hybrid model that uses CNN’s ability to extract spatial features, GRU’s ability to model time, and VAE’s ability to calibrate probabilities to make more accurate and reliable one-day-ahead electricity price forecasts in a variety of market conditions.
This study present a unified architecture that integrates convolutional neural networks for feature extraction, gated recurrent units for sequence modeling, and a Variational Autoencoder to assess uncertainty in forecasting electricity prices one day ahead.
Standard metrics (MSE, RMSE, MAE) are utilized to compare the CNN-GRU-VAE model to baseline models like ANN, CNN, GRU, and CNN-GRU on two real-world datasets from the UK and German electricity markets.
The proposed CNN-GRU-VAE model works well in a variety of market conditions, which shows that it can adjust to different patterns of demand and volatility. We use strict hypothesis testing methods to show that it is statistically better than the others.
3. Research Methodology
This section details how the CNN-GRU-VAE model works to predict power prices one day ahead. It includes gathering data, cleaning it up, training methods, setting up the model, and testing it.
There were two main markets for electricity prices: the UK power exchange and the German energy market. Data on prices was collected every hour [42], [43]. After cleaning and standardizing, the datasets are split into two groups: Training and testing. After that, these datasets were used to train hybrid deep learning models, such as CNN-GRU-VAE and other benchmark models.
The model design includes CNN layers for extracting features, GRU units for learning in order, and a VAE module to improve generalization. The following section talks about the suggested CNN-GRU-VAE model's relative performance and the results of the experiments.
This research employs actual data on energy prices from the UK and German electricity markets to evaluate the effectiveness of the proposed CNN-GRU-VAE forecasting model. A single CSV file contains all the daily readings from the UK market for a complete 24-hour period in 2021. Concurrently, information from the German power market, recognized for its unique single-settlement framework and significant price fluctuations, was incorporated. Predicting energy prices in Germany is challenging due to the significant impact of demand trends. Data was collected hourly from January 1, 2016, to August 31, 2017.
Mean imputation was employed to address missing values and verify the accuracy of the data. The data was sourced from utility companies, ensuring the absence of noisy sensors or outlier values. The dataset was divided into two segments: 70% allocated for training and 30% reserved for testing. A sliding window validated the entire process when examining time series prediction. Looked for patterns in the cost of power using analytical methods. The study of hourly and monthly averages showed that expenditures increased steadily over the winter, particularly at night. The patterns were the same in Germany and the UK. Z-score normalization is used to make the data more consistent and make the analysis of the neural network more efficient. To do this, use the dataset's average and range. During the preprocessing step, it was important to get rid of price data problems such spikes and oscillations.
This study shows how to use CNNs, GRUs, and a VAE to predict power prices one day in advance. The input dataset, $X=\{x1,x2,…,xT\}$, has hourly price measurements, and a 168-hour sliding window is used to predict the next 24 hours.
The convolutional layer (Conv) is engineered to identify short-term patterns and effectively capture local dependencies within the input variables. This layer is composed of several filters, each defined by a width $w$ and a height $n$, with $n$ indicating the number of variables present in the input. Upon the application of the $k$-th filter to the input matrix $X$, an output is generated through the operation. $h_k = RELU(w_k*X + b_k)$, which represents the convolution process. Here, $h_k$ The resulting output vector and the activation function are defined as $RELU(x) = max (0,x)$.
The convolution is done without adding left padding to the input matrix $X$ to make sure that the output length is always the same. This makes sure that each vector $h_k$ stays the same size, $T$. The output from the convolutional layer has dimensions of convo*$T$, where convo is the total number of convolutional filters used.
The GRU architecture improves on traditional recurrent neural networks (RNNs) by adding gating techniques that limit the flow of information. This lets the model efficiently capture long-term dependencies in sequential data. LSTMs use three gates and separate memory cells, while GRUs use just two gates: the update and reset gates [48]. This simplifies architecture and improves computational efficiency while keeping performance levels the same. The proposed model uses a convolutional (Conv) layer to get local spatial features from the input sequence. The characteristics are then sent to the GRU layer, which looks at the sequence and learns about temporal correlations by changing its internal state in response to the input using gating processes. Adding the ReLU activation function to the GRU layer significantly improves the model's ability to pick up on complicated time patterns, making predictions more accurate.
A VAE head converts the 256-dim GRU output into a latent Gaussian manifold to handle heavy-tailed price spikes. The encoder maps to mean $\mu$ and log variance $\log \sigma^2$, sampled via reparameterization ($z=\mu+\sigma \cdot \varepsilon$, $\varepsilon \sim \mathcal{N}(0,1)$). The decoder reconstructs the 24-hour forecast. Training minimizes reconstruction MSE plus $\beta$-weighted KL divergence ($\beta=10^{-3}$), encouraging a smooth latent space that embeds systematic uncertainty and extreme events.
The CNN extracts spatial patterns, the bi-GRU encodes long-term dependencies, and the VAE enhances generalization through latent space regularization. The decoder output provides the point estimate for day-ahead price forecasting during inference. The complete model was trained end-to-end with the following hyperparameter configuration: a learning rate of 0.001 using the Adam optimizer, batch size of 64, and a maximum of 100 epochs. The CNN module employs two convolutional layers with 64 and 128 filters respectively, while the GRU consists of two layers with 128 hidden units each. A dropout rate of 0.1 was applied after each layer to prevent overfitting. The VAE component uses a $\beta$-weighted KL divergence term with $\beta = 0.001$ to balance reconstruction accuracy and latent space regularization. Early stopping with a patience of 10 epochs was implemented, monitoring validation loss to prevent overfitting and reduce training time. The model was implemented using TensorFlow framework and trained on an NVIDIA GPU, with typical convergence achieved within 50–60 epochs. This configuration ensures reliable forecasts that are resistant to noise while maintaining interpretability through the structured latent space representation.
To assess the accuracy of the CNN-GRU-VAE model, three performance metrics were used: Root Mean Square Error (RMSE), Mean Square Error (MSE), and Mean Absolute Error (MAE). These metrics quantify prediction errors and provide a robust basis for comparing the model with alternatives. These metrics are standard evaluation measures widely adopted in time series forecasting and electricity price prediction studies [10], [21], [22], [44], [46]. Because the paper targets deterministic one-day-ahead point forecasts, the models are assessed using MAE, RMSE, and MSE only, and p-values or confidence intervals from formal hypothesis tests are reported.
4. Results and Discussion
Model performance was evaluated using RMSE, MAE, and MSE metrics; convergence behaviour and forecast plots were analysed independently for each dataset; and statistical hypothesis testing was performed to substantiate the deep learning model’s superiority over competing methods.
Table 2 shows how well different forecasting models did in the UK and German energy markets, like ANN, CNN, GRU, CNN-GRU, and the new CNN-GRU-VAE model. The CNN-GRU-VAE model was the best because it made the fewest mistakes on both datasets. For example, it had an MAE of 1.582 in the UK market, better than the GRU model, which had an MAE of 2.226. The CNN-GRU-VAE has a much higher MAE of 0.637 in Germany than earlier models, which means it is more accurate and useful. Figure 1 show projections against actual prices in Germany. Figure 1a shows a week-profile from June 19 to 26, 2017, while Figure 1b and Figure 1c show daily profiles on July 25 and August 31, 2017, respectively. On the other hand, Figure 2 shows projections against actual prices in the UK. Figure 2a shows a week-profile from October 26 to 31, 2017, while Figure 2b and Figure 2c show daily profiles on September 26 and October 20, respectively.
Dataset | ANN | CNN | GRU | CNN-GRU | CNN-GRU-VAE | |
MSE | UK | 16.49659 | 25.79399 | 10.00713 | 28.74186134 | 4.513042739 |
RMSE | UK | 4.0616 | 5.078779 | 3.163405 | 5.361143659 | 2.124392322 |
MAE | UK | 3.263641 | 3.783223 | 2.226243 | 4.019293662 | 1.582000366 |
MSE | Germany | 12.47296 | 5.001639 | 0.887457 | 5.261977817 | 0.762747195 |
RMSE | Germany | 3.531708 | 2.236435 | 0.942049 | 2.293900132 | 0.873353992 |
MAE | Germany | 3.223836 | 1.66531 | 0.693797 | 1.720458785 | 0.6373411 |






Table 3 compares the proposed CNN-GRU-VAE model to several benchmark methods already discussed in the literature. To ensure fair comparison, all benchmarked studies utilize the same forecasting horizon (24-hour day-ahead predictions) and are evaluated on either UK or German electricity market datasets, which are also used in this study as well. The CNN-GRU-VAE model is better at making predictions because it has the lowest MAE values (1.582 for the UK and 0.637 for Germany) and the lowest RMSE values (2.124 for the UK and 0.873 for Germany). The consistent evaluation metrics and comparable time horizons provide a valid basis for performance assessment. This means it is becoming more general and helpful for making accurate predictions than other top methods. Notably, the proposed model has proven better performance across both market conditions, indicating its robustness and adaptability to different market dynamics and price volatility patterns.
Reference | Year | Model | Dataset | MAE | RMSE |
[22] | 2022 | ARD-ETR | UK | 2.03 | 3.59 |
Proposed | 2025 | CNN-GRU-VAE | UK | 1.582 | 2.124 |
[48] | 2022 | SA-DELM | UK | 3.8 | 4.7 |
Proposed | 2025 | CNN-GRU-VAE | German | 0.637 | 0.873 |
[49] | 2016 | X-Model | German | 4.35 | 6.46 |
[50] | 2020 | FFNN | German | 7.08 | 9.41 |
[19] | 2021 | DFNN | German | 3.424 | 5.927 |
[51] | 2021 | CNN-LSTM Encoder-Decoder | German | – | 4.97 |
5. Conclusions
EPF is important for efficient power market operations, where the price volatility notably influences market participants’ strategic decisions and risk management. This study introduced a new hybrid deep learning model (CNN-GRU-VAE) for day-ahead electricity price prediction, integrating conv block for spatial feature extraction, with a GRU for temporal sequence modeling, and a VAE for probabilistic calibration and uncertainty quantification.
An evaluations was performed on UK and German electricity market datasets showing that the CNN-GRU-VAE model achieves a better performance compared to other models, including ANN, CNN, GRU, and CNN-GRU. The proposed model attained RMSE value of 0.8733, MSE of 0.7627, and MAE of 0.6373 on the German dataset, while RMSE of 2.124, MSE of 4.513, and MAE of 1.582 on the UK dataset. These outcomes verify that the integration of these components within a unified framework improves forecasting accuracy and stability compared to traditional and hybrid deep learning approaches.
The proposed model offers practical value for electricity market stakeholders. It can be utilized by energy traders and utility companies to improved forecasts to develop more informed bidding strategies and robust risk management protocols, potentially reducing operating costs and improving competitive positioning. System operators can utilize accurate price predictions to optimize grid operations, better integrate renewable energy sources, and enhance market stability by anticipating and preparing for price fluctuations.
Even though the proposed model achieved good results, it has some limitations. First, the current model relies exclusively on univariate historical price data without incorporating exogenous variables such as weather conditions, fuel prices, and others, which could potentially enhance forecasting accuracy. Second, although the datasets represent main electricity markets are limited in geographic and temporal scope. Lastly, a hyperparameter optimization process was not explored, leaving potential room for further performance improvements.
These issues should be addressed in future study by adding more relevant exogenous variables to the input feature set and doing systematic hyperparameter optimization studies. Additionally, looking the model's interpretability through feature importance analysis to provide a valuable insights into which factors most significantly influence electricity price predictions, thereby enhancing transparency in model-driven decision-making. Also, the model's application can be extended to additional electricity markets with diverse regulatory frameworks and evaluating its performance under extreme market conditions would further validate its robustness and generalizability. Finally, this study evaluates deterministic one-day-ahead point forecasts using MAE, RMSE, and MSE only; no p-values, confidence intervals, or probabilistic forecast metrics are reported.
Conceptualization, Z.F. and E.A.H.; methodology, Z.F.; software, Z.F.; validation, Z.F. and E.A.H.; formal analysis, Z.F.; investigation, Z.F.; resources, Z.F.; data curation, Z.F.; writing—original draft preparation, Z.F.; writing—review and editing, Z.F. and E.A.H.; visualization, Z.F.; supervision, E.A.H.; project administration, Z.F. and E.A.H.; funding acquisition, E.A.H. All authors have read and agreed to the published version of the manuscript.
The data used to support the research findings are available from the corresponding author upon request.
The authors declare no conflict of interest.
