Forecasting Crude Oil Prices by Using ARIMA Model: Evidence from Tanzania
Abstract:
The fluctuation in the price of crude oil on the global market has created a lot of attention to the researchers to investigate its price movement. This study tries to address the problem of predicting crude oil prices in a situation of unusual circumstances. In this study, Box Jenkins methodology was used to analyze monthly dynamics of the Brent oil price from January 2002 to February 2022. Data were first differenced to achieve stationarity, and then ACF and residual diagnostics were utilized to choose models that were used for analysis. The performance of various models were evaluated and ARIMA (0, 1, 1) was found to be the best model for forecasting crude oil prices. This study further reveals that despite the corona virus and the Ukraine war having a considerable impact on crude oil prices, such a model is still capable of capturing the underlying volatility in crude oil prices. Oil demand suddenly decreased as a result of the corona outbreak, but then abruptly increased as a result of the conflict in Ukraine. Therefore, there is a need to update the ARIMA model in order to best predict the price of crude oil in a time of exceptional circumstances. Because of the nature of world oil market, predictions for the medium and long term are often therefore, we have limited the scope of our forecasts in this study to a single year in order to achieve the highest level of accuracy.
1. Introduction
Undoubtedly, one of the most important commodities in today’s globe is crude oil (Selvi, Kaviya Shree, & Krishnan, 2018). Many of the things we use every day depend on oil for their manufacturing and transportation. A significant influence on the other industries comes from the oil producing sector. Any change in the price of petroleum products has an enormous impact on the costs of other items produced and even the health of the economy (Fondo et al., 2021). Oil price swings have a cascading impact on our daily lives, affecting things like food supply, detergents, prescription drugs, and household appliances, to name a few. Similar to any other commodity, its price changes in reaction to market forces of demand and supply. Changes in the oil prices will either negatively affect many economic sectors or positively impact others as they rise or fall (Mensah, 2015). Making crude oil one of the main economic issues and a major topic of debate in global economic policy.
Given the last two decades, it is clear that the financial crisis caused by the housing bubble had a negative impact on oil prices, which went from US$ 133.88 per barrel in June 2008 to under $40 per barrel within a few months of the crisis. Following that, the prices rose till they reached $100 in 2014, but this did not persist for long, and they dropped below $30 in 2016 as a result of the increased supply of crude oil. Due to the impacts of Covid 19, prices in 2020 decreased to their lowest point in 20 years, reaching $ 16.55 in April. But as a result of the situation in Ukraine, prices increased once more. These shifts in the price of crude oil internationally undoubtedly have an impact on nations that rely significantly on imported crude oil (Nyongesa & Wagala, 2016; Rodhan & Jaaz, 2022; Selvi et al., 2018). Recently Covid 19 had a negative impact on oil prices, the prices of crude oil in 2020 decreased to their lowest point in 20 years, reaching $ 16.55 in April. But as a result of Ukraine war, prices increased once more. These shifts in the price of crude oil internationally undoubtedly have an impact on nations that rely significantly on imported crude oil (Nyongesa & Wagala, 2016; Rodhan & Jaaz, 2022; Selvi et al., 2018).
Tanzania has been exporting natural gas for more than 50 years and is a significant producer of the fuel. Songo Songo Island (Lindi Region) made the first natural gas finding in Tanzania, which was thereafter followed by Mnazi Bay (Mtwara Region). However, Tanzania neither produces crude oil nor have any recent commercial oil discovery (Saxena & Ndule, 2020). An average of 35,000 barrels of refined oil products are consumed daily in Tanzania, all of which are imported. The consumption of crude oil and petroleum products makes up the greatest portion of all commercial energy consumption, and it has been expanding quickly since 2010 along with an increase in motorization (JICA, 2022; ITA, 2021). However, their supply is fully dependent on imports. The Bank of Tanzania reports that in addition to an increase in oil imports, domestic prices also climbed in line with prices on the international market (BOT, 2022) taking into account the effects of COVID-19 and the aftermath of the war in Ukraine.
Tanzania relies heavily on crude oil imports for the majority of its industrial and socioeconomic operations (Shakiru & Liu, 2022). Oil is used to transport goods and services; its price has an impact on the entire economy. The price of goods and services tends to rise when the price of oil rises. Moreover, when gasoline and diesel prices rise, it affects everyone which causes inflation (Saxena & Ndule (2020). The rapid growth of Tanzania's economy has necessitated an increase in the demand for crude oil which has the multiplier effect on the economy. Several authors examined the asymmetric relationship between oil price shocks and macroeconomic fluctuations in Tanzania and found that crude oil prices have an impact on GDP and inflation (Shakiru & Liu, (2022); Ndule, (2019)), on explaining the variability of Tanzanian shillings’ exchange rates (Mwankemwa, Kibona & Said, 2020) and even on stock market performance (Kasongwa & Minja, 2022). It is thus crucial to conduct a thorough analysis and forecast of crude oil prices since they have a multiplier effect on the production, distribution of goods and the economy at large.
A stochastic modeling approach that incorporates the time-dependent structure present in the time series crude oil price data can be used to study the dynamics and evolution of crude oil prices. One of the key components of an economic analysis is forecasting. Past observations can be utilized to forecast future values by identifying suitable models to reflect the data. There are several analytical techniques that have gained a lot of traction recently in forecasting crude oil. These are such as GARCH-type models as applied by (Ahmed & Shabri, 2014; Ng’ang’a & Oleche, 2022), Support Vector Machine (SVM) that is used to forecast data of high volatility (Mensah, 2015) and Autoregressive Integrated Moving Average (ARIMA) popularly known as Box Jenkins Methodology (Wiri & Tuaneh, 2019; Rodhan & Jaaz, 2022; Shambulingappa et al., 2020).
Typically, ARIMA models are put forth for linear time series, capturing linear properties in the time series (Ahmed & Shabri, 2014). Due to accuracy, mathematical soundness, and flexibility of the ARIMA techniques, which combine the Autoregressive (AR) Process, Moving Average (MA), and Autoregressive Integrated Moving Average (ARIMA) analysis, these techniques have been primarily employed for loading forecast (Mombeini & Yazdani-Chamzini, 2014).
However, there have been prior attempts to use the ARIMA Model to forecast the price of crude oil (Mensah, 2015; Wiri & Tuaneh, 2019; Rodhan & Jaaz, 2022). Nevertheless, this study tries to address the problem of predicting crude oil prices in a situation with unusual circumstances. Oil demand suddenly decreased as a result of the corona outbreak, but then abruptly increased as a result of the conflict in Ukraine. Therefore, there is a need to update the ARIMA model in order to best predict the price of crude oil in a time of exceptional circumstances. Because of the nature of world oil market, predictions for the medium and long term are often incorrect (Rodhan & Jaaz, 2022). We have limited the scope of our forecasts in this study to a single year in order to achieve the highest level of accuracy.
2. Literature Review
Utilizing the Box Jenkins methodology for a precise forecast, Mensah (2015) looked at the monthly Brent oil price dynamics over the last two decades. They compared the accuracy of numerous models’ predictions of crude oil in their study. In the face of oil price volatility, a non-parsimonious ARIMA (1,1,1) model proved to be the most effective forecasting model; forecast accuracy was evaluated using the MSE and MAE technique.
Additionally, attempts to develop statistical models for predicting Kenyan petroleum prices as performed by (Bichanga, 2018). Numerous Autoregressive Integrated Moving Average models, including ARIMA (1,1,0), ARIMA (1,1,1), ARIMA (1,1,2), and ARIMA, were utilized. Since ARIMA (1,1,0) had the lowest AIC values after using the AIC criterion and log likelihood, it was found to be more accurate at predicting petroleum prices. (Rodhan & Jaaz, 2022) also used ARIMA models to analyze time-series data and utilized it for predictions. Their study looked at 375 months of WTI crude oil price data from January 1990 to March 2021. Data on WTI prices was acquired from the US Energy Information Administration (EIA). The ARIMA approach was used to make forecasts for the next 12 months. The outcomes of many models were evaluated, and the ARIMA (1,1,4) model was shown to be the most accurate forecasting model.
Shah & Kiruthiga (2020) also applied ARIMA in crude oil price forecasting. They managed to examine the time series and nonlinear feature of the oil prices. According to their results, ARIMA (0,1,4) was the most appropriate model for prediction of the oil prices. In order to make predictions, (Selvi et al., 2018) also used the ARIMA model; it was concluded from their times series analysis observations that the ARIMA model they had established was sufficient. Their projections for the years 2017 through 2021 were produced using the model. Crude oil prices will rise throughout the course of the following year. They suggested that prices should be stabilized and that extra attention should be paid to monitoring oil prices because a steady increase in oil costs could be a significant problem for a country’s economy in the future.
Suleiman, et.al (2015) looked at the most effective GARCH and ARIMA models for accurately forecasting the price of crude oil in Nigeria. The 189 monthly crude oil price observations used in this analysis covered the period from January 1998 to September 2013. Based on factors like AIC, HQC, and SIC, their study evaluated fifteen (15) models and chose the top ARIMA and GARCH models. The model with the least values of the criteria was deemed to be the best model. According to their findings, the best models for predicting the crude oil price data series were ARIMA (3, 1, 1) and GARCH (2, 1). Models AR 1, AR 3, and MA 1 were significant at the 0.05 significance level. Their projection, which was developed for a period of six months, indicates a sharp increase in the price of crude oil when compared to historical averages.
On the contrary, for other authors that did a hybrid of ARIMA models with other models like (Mensah, 2015) show that ARIMA model is not able to capture the volatility inherent in the crude oil price for an accurate forecast. (Fondo et al., 2021) also made predictions of petroleum prices. Prediction was made for the next twelve months. Due to data volatility both ARIMA and the VAR model were applied. The VAR model had the least error. Hence according to their study VAR is a better model for predicting petroleum prices in Kenya. (Ahmed & Shabri, 2014) also forecasted crude oil price based of three techniques, Support Vector Machines (SVM) in comparison to the performance to ARIMA and GARCH. In their study data on crude oil price of West Texas Intermediate (WTI) was used. The results revealed that SVM method outperforms the other two in terms of forecast accuracy as it achieved the smallest forecast error judging by their RMSE and MAE followed by ARIMA then GARCH. The results reviewed that the proposed SVM method outperforms the others. (Shah & Kiruthiga, 2020) also had similar conclusions on SVM method.
3. Research Methodology
A time series design was used in this study. This design was chosen due to the fact that crude oil prices tend to fluctuate over time. Using monthly crude oil prices from January 2002 to April 2022 as a suitable time series, the ARIMA model was fitted to produce predictions that can be used to predict crude oil prices in the future using historical data.
Monthly Brent Crude oil spot prices spanning from January 2002 to April 2022 from central bank of Tanzania website were used in this study for modeling and forecasting. This information was chosen because Brent Crude oil is considered as the benchmark for crude oils in Europe and Africa (Dunn, Holloway, et al., 2012). Crude oil is also traded by its own or its price is reflected by the price of other types of crude oil (Ng’ang’a & Oleche, 2022). The accessibility of the data was another consideration in this selection.
R Statistical Software was used to analyze the time series data in order to develop a model that have been used to predict the future crude oil price. To simplify the process, we had to make data stationary. Log transformation was applied to the dataset (Mensah, 2015) to make them stationary. Analysis also involved the identification of diagnostic test values and time series plots such as ACF and PACF graphs which were necessary for the study.
Autocorrelation is also known as serial correlation. It measures the degree of association between observations of the current series and the lagged version of the same series in two successful time periods. The formula for computing autocorrelation coefficient between $\sqrt{x_t}$ and $\sqrt{x_{t-k}}$, k lags apart is given by
$\gamma_k=\frac{\sum_{k=0}^{k=n}\left(x_t-\bar{x}\right)\left(x_{t-k}-\bar{x}\right)}{\sum_{k=0}^{k=n}\left(x_t-\bar{x}\right)^2}$ for $\mathrm{k}=0,1,2,3, \ldots, \mathrm{n}$
where $\sqrt{\overline{{x}}}$ is the mean of the given time series, $\sqrt{x_t}$ is the series of observations at time period $t$ and $\sqrt{x_{t-k}}$ is a series of observations $k$ lags apart.
4. Partial Autocorrelation
Partial autocorrelation at lag k is the relationship between $\sqrt{x_t}$ and $\sqrt{x_{t-k}}$ once the impacts of the intervening variables have been taken into account. The application of this function was established as a component of the Box-Jenkins technique to time series modeling, whereby one could identify the proper lag of an AR process or an extended ARIMA (p, d, q) model.
In an autoregressive process a dependent variable is expressed in terms of its own prior values. Auto regressive model is applied to data that is stationary because its mean, variance, and autocorrelation function remain constant throughout time. The following is a generalized autoregressive process of order p.
$x_t=\alpha+\theta_1 x_{t-1}+\theta_2 x_{t-2}+\theta_3 x_{t-3}+\ldots+\theta_p x_{t-p}+\varepsilon_t$
Where $\sqrt{x_t}$ is the dependent variable to be predicted, $\sqrt{x_{t-1}}, \sqrt{x_{t-2}}, \sqrt{x_{t-3}}, \ldots, \sqrt{x_{t-p}}$ are past values of the series at lags $t-1, t-2, t-3, \ldots, t-p$ respectively, $\alpha$ is the constant parameter of the model, $\theta 1, \theta 2, \theta 3, \ldots, \theta \mathrm{p}$ are the parameters to be estimated and $\varepsilon_t$ is the error term which is normally distributed with zero mean and a constant variance ($\sqrt{\sigma^2}$).
A moving average model links the current value with the previous random errors. A moving average model of order q is written as follows
$x_t=\mu_t+\phi_1 \varepsilon_{t-1}+\phi_2 \varepsilon_{t-2}+\phi_3 \varepsilon_{t-3}+\ldots+\phi_p \varepsilon_{t-q}+\varepsilon_t$
$\sqrt{x_t}$ is the dependent variable to be predicted, $\sqrt{\varepsilon_{t-1}}, \sqrt{\varepsilon_{t-2}}, \ldots, \sqrt{\varepsilon_{t-q}}$ are the error terms of the process at lags $t-1, t-2, t-3, \ldots, t-q$ respectively, $\mu$ is the average fluctuation of the moving average process $\sqrt{\phi_1}, \sqrt{\phi_2}, \sqrt{\phi_3} \ldots, \sqrt{\phi_q}$ are the parameters of a moving average process to be estimated and $\sqrt\varepsilon_t$ is the current error term which is normally distributed with zero mean and a constant variance $\left(\sqrt{\sigma^2}\right)$.
An autoregressive integrated moving average (ARIMA) model is a modification of an autoregressive moving average (ARMA) framework used in analysis of time series. These two models are used in time series analysis to better understand the given dataset and predict the future values. When the data is not stationary, the ARIMA model is applied. Stationarity in the data may be attained by differencing the data 5 once or more. A generalized ARIMA of order (p, d, q) is a combination of auto regressive and moving average process of order p and q respectively.
$x_t=\alpha+\theta_1 x_{t-1}+\theta_2 x_{t-2}+\ldots+\theta_p x_{t-p}+\phi_1 \varepsilon_{t-1}+\phi_2 \varepsilon_{t-2}+\ldots+\phi_p \varepsilon_{t-q}+\varepsilon_t$
To decide which ARIMA model is best for prediction, autocorrelation function (ACF) and partial autocorrelation function (PACF) would be employed. ARIMA model can be estimated by following Box-Jenkins methodology. (Zhao & Wang, 2014).
The primary focus of this study was the use of Box-Jenkins (1976) technique, which involves the following phases; Model specification, model selection and diagnostic and time series forecasting. The phases of this technique are described in details in the sections below.
Utilizing a stationary time series is one of the prerequisites for the Box-Jenkins approach. The series must go through a logarithmic transformation if it is not stationary in order to lower the level of variability before being differenced in order to make it stationary. The stationarity test is carried out by analyzing the correlogram using the autocorrelation function (ACF) and partial autocorrelation function. If the ACF value is 0 for each lag, the data is stationary (Agustin, 2019). After checking for stationarity, finding the ARIMA model is the following step. The Autocorrelation Function (ACF) and the Partial Autocorrelation Function (ACF) are the two commonly used techniques for choosing the ARIMA model.
This study employed the Akaike information criteria, which were first proposed by Akaike (1973), to determine which model was the best. AIC, a penalized-likelihood criterion, is a relative measure of the distance between the fitted likelihood function of the model and the real likelihood function of the data (Wacuka Ng’ang’a & Oleche, 2022). If AIC of the model is lower, it is considered to be more accurate (Suleiman et al., 2015).
This technique is used by the Box Jenkins-Methodology to determine whether residuals are white noise. To do this, the serial correlation is observed if it exists in the plots of Auto Correlation Function (ACF) and Partial Auto Correlation Function (PACF) of the residuals. Additionally, the residuals are assumed to follow a normal distribution with a mean of zero and a constant variance (Nyongesa & Wagala, 2016).
This stage involves getting the accurate forecasts from the chosen ARIMA model in the model selection stage. These estimates are obtained by fitting the time series data to the ARIMA model. This stage also provides some cautions regarding the suitability of the model; if the model does not adhere to the standards for suitability, it is ignored (Nyongesa & Wagala, 2016).
5. Findings and Discussion
For the initial investigation, time series plots for the given series were employed as shown in Figure 1.
Visual examination of the time plot reveals that the mean and variance are evidently non-constant, indicating that the data is not stationary.
In addition to the inspection approach for detecting stationarity, autocorrelation function (ACF) for the price of crude oil also provides extremely helpful information suggesting the series is not stationary as shown in Figure 2.
The ACF shows a steady decline as the number of lags increases. This behavior is anticipated when a time series is likely to display random walk behavior (Selvi, 2018). To achieve stationarity in the series, we calculate log of the series and plot the new ACF.
By obtaining the first differences of the natural logarithm of the values, the series was changed to achieve stationarity. The transformation was done by creating a new variable y_{t} = ln(p_{t}+1) - ln (p_{t}), where pt is the price of crude oil. The graph for the returns that was obtained is shown in Figure 3.
The time plot in figure 3 shows stationarity because variance and mean of the series is now constant. Crude oil price suddenly decreased as a result of the corona outbreak, but then abruptly increased as a result of the conflict in Ukraine as shown in Figure 3.
The ACF of the converted data is displayed in Figure 4 The ACF of the series shows that the prices of crude oil were stationarity after log differencing.
In order to find the proper order of p and q for our model, we begin by visualizing the ACF and PACF against various lags.
The ACF and PACF graphs in figure 5 gives insufficient information about the order of the best ARIMA model. When ACF and PACF of the differenced series are not sometimes sufficient in determining the order of p and q, the BIC (Bayes Information Criterion) and AIC (Akaike Information Criterion), among numerous other information-based criteria, are sometimes used to decide the order of p and q. We refer to the AIC and BIC criteria as shown in table for more information.
ARIMA (0, 1, 0) | ARIMA (1, 1, 1) | ARIMA (0, 1, 1) | ARIMA (1, 1, 0) | |
---|---|---|---|---|
AIC | -435.44 | -459.37 | -460.94 | -459.28 |
BIC | -431.94 | -448.88 | -453.95 | -452.28 |
The AIC and BIC criteria both point to an ARIMA (0, 1, 1) model when comparing the AIC and BIC values produced by fitting the various p and q as given in table 1. The model with the lowest information criterion is the best. The best model to predict monthly crude oil prices was found to be ARIMA (0, 1, 1) This result is contrary to the one that was reported by (Mensah, 2015; Wiri & Tuaneh, 2019; Rodhan & Jaaz, 2022). The model can mathematically be expressed as follows.
$\sqrt{x_t=0.3335 \varepsilon_{t-1}+\varepsilon_t}$
If the selected model is appropriate for modelling crude oil prices, the residual should be a realization of white noise. That means, residuals must be independent and should follow a normal distribution. We evaluate this property visually using time series residual plots. Additionally, we do the Ljung-Box test to access the autocorrelation of residuals.
Figure 6 make it evident that the residuals of the model are stationary and have no autocorrelation. The Ljung-Box test likewise suggests a p-value of 0.8503 (greater than 0.05), verifying the normality of residuals. The histogram plot, which displays data that is not strongly normally distributed, further supports this.
The model appears to have a significant level of forecasting power due to a little difference between actual values and those projected by the model as presented in Figure 7.
The predicted values for the 95 confidence intervals from the suggested ARIMA model are displayed in figure 7 and table 2 for the next 12 months.
Month | Point Forecast | 95% Low Confidence Level | 95% Low Confidence Level |
May 2022 | 4.613051 | 4.430902 | 4.795200 |
Jun 2022 | 4.613051 | 4.309448 | 4.916654 |
Jul 2022 | 4.613051 | 4.224244 | 5.001859 |
Aug 2022 | 4.613051 | 4.154611 | 5.071492 |
Sep 2022 | 4.613051 | 4.094241 | 5.131861 |
Oct 2022 | 4.613051 | 4.040198 | 5.185904 |
Nov 2022 | 4.613051 | 3.990832 | 5.235271 |
Dec 2022 | 4.613051 | 3.945104 | 5.280998 |
Jan 2023 | 4.613051 | 3.902312 | 5.323790 |
Feb 2023 | 4.613051 | 3.861954 | 5.364148 |
Mar 2023 | 4.613051 | 3.823657 | 5.402445 |
Apr 2023 | 4.613051 | 3.787134 | 5.438968 |
6. Conclusions and Recommendations
Tanzania tends to grow economically into high middle-income status as a major producer of natural gas and has been exporting the product for more than 50 years but is still an importer of crude oil. This study examines According to the results of this study, crude oil prices are not stationary and have non-constant mean and variance, which calls for the use of a non-linear models. Additionally, the prices of crude oil sometimes fluctuate unpredictably. According to the study’s findings, Autoregressive Integrated Moving Average ARIMA (0, 1, 1) was an effective tool for forecasting crude oil prices. The generated predictions show an increasing trend in the price of crude oil during the next 12 months. Therefore, price adjustments are necessary and special attention should be given to monitoring oil prices.
The study recommends more research be done to determine the efficacy of different models, such as simple exponential smoothing and generalized autoregressive conditional heteroscedasticity (GARCH), in modeling and forecasting Tanzanian oil prices in this time when the oil prices have been significantly impacted by the corona virus and the Ukraine war. The most effective model for analyzing the volatility of crude oil prices can then be chosen after comparing different models with the ARIMA model. This study also proposes further research to identify additional variables affecting oil prices in Tanzania.