Modelling Zero-Inflated Time Series Count Data Using Covid-19 Data

olumide s. adesina; lawrence. o. obokoh

Outline

Acadlore takes over the publication of IJCMEM from 2025 Vol. 13, No. 3. The preceding volumes were published under a CC BY 4.0 license by the previous owner, and displayed here as agreed between Acadlore and the previous owner. ✯ : This issue/volume is not published by Acadlore.

Open Access

Research article

Modelling Zero-Inflated Time Series Count Data Using Covid-19 Data

olumide s. adesina^*

,

lawrence. o. obokoh

Johannesburg Business School, University of Johannesburg, 2092 Johannesburg, South Africa

International Journal of Computational Methods and Experimental Measurements

|

Volume 13, Issue 2, 2025

|

Pages 273-279

https://doi.org/10.18280/ijcmem.130206

Received: 09-23-2024,

Revised: 11-26-2024,

Accepted: 12-06-2024,

Available online: 06-29-2025

View Full Article|

Download PDF

Abstract:

Time series count data such as daily cases of Covid-19 requires adequate modelling and forecasting. Traditional time series models do not have limitations in modelling time series count data, also known as unbounded N-valued data. This study involved in-depth analyses of various models in fitting time unbounded N-valued data. Models such as the Zero-Inflated Poisson, zero-inflated Binomial, and ARIMA popularly used to fit time series count were compared with the integer-valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. The investigation involved two critical aspects: simulation and real-life data analysis. First, we simulated the time series count data, modelled and compared the performance of the competing models. The simulation outcomes consistently favoured the Negative Binomial INGARCH models highlighting their suitability for count data modelling. Subsequently, we examined life data on Covid-19 data in Nigeria. The life data also yielded strong support for the NB INGARCH model. This study recommends further exploration of the NB INGARCH model, as it exhibits substantial promise in effectively modelling over-dispersed zero-inflated data. The current study contributes valuable insights into selecting appropriate models for time series count data, addressing the intricate challenges posed by this specialized data type. Also, the overall outcome of the study helps in national planning, and resource allocation for the people needing health intervention.

Keywords: INGARCH, Negative binomial, Poisson, Zero-inflation, Dispersion, Unbounded data

1. Introduction

Time series counts data are discrete time series which include zero and positive integers. There are dedicated models suitable for fitting a specific type of data, such as the Poisson regression, Negative Binomial, and other discrete distributions in the exponential class of family because of certain characteristics such as link function. In the same light, models such as Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA) and extensions are suitable in fitting time series data. Time series count data are generated daily in various fields but not always treated as it should, example of such is daily admittance of patience or discharge in the health facilities [1], another is daily new cases of Covid-19 infected cases, to mention but a few. The traditional methods such as Ordinary Least Square (OLS) usually break down when used in fitting time series or count data. The OLS faces problems such as heteroskedasticity thereby causing overfitting when used to fit count data or time series data [2].

There are numerous alternative models that have been developed to address the problem of heteroscedasticity or overfitting. These models cover a wide range of statistical techniques, such as Discrete Weibull distributions, Dirichlet mixture models and COM-Poisson models [3]. The main goal is to offer robust methods for modelling count data that exhibit under- or over-dispersion. However, while these models have proven useful in a variety of contexts, their utility may not extend seamlessly to the domain of time series count data. The domain of count data analysis presents its own set of unique challenges and complexities, frequently necessitating the development of distinct modelling strategies and methodologies.

Addressing the complexities of zero-inflated count data presents an immense challenge in the realm of statistical analysis and time series modelling [4, 5]. These data, which are frequently distinguished by an excess of zero values and a non-standard distribution, are encountered in a variety of fields, ranging from epidemiology and finance to ecology and social sciences [6, 7]. Understanding and effectively modeling zero-inflated time series count data is a practical necessity, as it is used in predicting disease outbreaks [8], analyzing financial anomalies [9], and studying population dynamics [10], among many other domains.

Zero-inflation occurs in data when two distinct processes govern the observed counts: one that generates zeros more frequently than a standard distribution would predict, and another that generates non-zero counts [5]. These complexities necessitate the use of specialized modelling techniques to account for excess zeros, temporal dependencies, and other time series data-specific factors [11]. In this context, choosing an appropriate model is critical because it directly affects prediction accuracy and inference validity [12-14].

The critical task of comparing different models for fitting zero-inflated time series count data is considered in this study. The study objectives lie in modelling time series count data using the appropriate model which is often ignored and demonstrate its robustness and adequacy in fitting count data by using various model selection criteria and the scoring function. This study intends to provide useful insights for scholars, as well as practitioners dealing with zero-inflated time series count data by examining various methodologies and shedding light on the best practices for modelling.

This study evaluates several models, including ARIMA, zero-inflated binomial, zero-inflated Poisson, and Integrated generalized autoregressive conditional heteroscedasticity (INGARCH), and assesses their applicability to stock modelling based on based on distributions such as Poisson, the linear and quadratic negative binomial, the double Poisson and the generalized Poisson [15]. The selected models underwent a thorough assessment, considering their capacity to represent the data's zero-inflated nature, temporal dependencies, and produce precise forecasts.

The findings of this comparative study will help us better understand how to model zero-inflated time series count data and will also assist researchers in finding the best model for their particular use cases [16]. Navigating the wide range of modelling techniques available is crucial because the choice depends on the calibre and dependability of the insights obtained from the data [17]. This exploration of zero-inflated time series count data is expected to be enlightening, offering innovative perspectives on the opportunities within the field [18]. The remaining sections of the paper include the methodology in Section 2, the results in Section 3, Section 4 the discussion, and finally Section 5, the summary and conclusion.

2. Methodology

3. Results

3.1 Simulation

4. Discussion

This study focuses on comparing various models for fitting zero-inflated time series count data, using Covid-19 statistics. The study addresses the challenge of modelling time series count data, which often involves over-dispersion and zero-inflation. Various models, including Poisson INGARCH, Negative Binomial INGARCH, Zero-Inflated Poisson, Zero-Inflated Binomial, and ARIMA, are assessed in terms of their suitability and performance. Two critical aspects of analysis are considered: simulation and real-life data.

In the simulation phase, it is observed that Negative Binomial INGARCH models perform exceptionally well, showing superiority over other models such as Poisson INGARCH, Zero-Inflated Poisson, Zero-Inflated Binomial, and ARIMA. This is evident from the AIC and BIC comparisons. In the life data analysis, using Covid-19 statistics, the study indicates that the Zero-Inflated Negative Binomial (ZINB) model demonstrates the best fit based on model selection criteria, particularly AIC and BIC. This result implies that the NB INGARCH model holds substantial promise in effectively modelling over-dispersed and zero-inflated data, which is characteristic of COVID-19 case counts. The results also show that the Covid-19 new cases will be consistent over a period. The implication of the study is that we could adopt a more reliable model for modelling unbounded N-valued integer, such as daily Covid-19 data. Such would help the policy makers to make adequate preparation for health facilities and the required funding. The study aligns with Sustainable Development Goal (SDG) 8 is focused on promoting sustained, inclusive economic growth. The limitation of the study lies in access to data of new Covid-19 cases in other countries for comparative studies.

These results emphasize the significance of choosing an appropriate model that accounts for over-dispersion and zero-inflation, which are common characteristics of count data in various contexts. Adesina et al. [24] showed the superiority of ARFIMA over ARIMA to model Covid-19 data, and future study can investigate the strength of such model against the INGARCH models. The current study offers superior modelling relative to Chan et al. [25] and Busari and Samson [26] who used count regression models, ARIMA, and other machine learning models without giving attention to the count part. Though the study found negative Binomial and ARIMA most appropriate respectively.

5. Conclusion

The study has demonstrated the superiority of Negative Binomial INGARCH (1,1) model over the competing models and contributes significant insights into the selection of appropriate models for time series count data, with a particular focus on zero-inflated data. The study's findings suggest that Negative Binomial INGARCH model performed well with both simulated and life data.

The study recommends future research to validate these findings with diverse simulation approaches and other real data sources. Additionally, it encourages the exploration of the NB INGARCH model, as it shows potential for being a robust and versatile choice for modelling over-dispersed zero-inflated time series count data, filling a notable gap in existing literature. This research, therefore, contributes valuable insights for both scholars and practitioners dealing with time series count data and offers a promising direction for future studies. Future research can also extend to the INGARCH to mixture models such as Dirichlet mixture models.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

[1] Kakad, M., Utley, M., Dahl, F.A. (2023). Using stochastic simulation modelling to study occupancy levels of decentralised admission avoidance units in Norway. Health Systems, 12(3): 317-331. [Crossref]

[2] Cameron, A.C. (2005). Microeconometrics: Methods and Applications. Cambridge University.

[3] Adesina, O.S., Adekeye, K.S., Adedotun, A.F., Adeboye, N.O., Ogundile, P.O., Odetunmibi, O.A. (2023). On the performance of dirichlet prior mixture of generalized linear mixed models for zero truncated count data. Journal of Statistics Application Probability, 12(3): 1169-1178.

[4] Feng, C. (2020). Zero-inflated models for adjusting varying exposures: A cautionary note on the pitfalls of using offset. Journal of Applied Statistics, 49(1): 1-23. [Crossref]

[5] Feng, C.X. (2021). A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. Journal of Statistical Distributions and Applications, 8: 8. [Crossref]

[6] Agarwal, D.K., Gelfand, A.E. Citron-Pousty, S. (2002). Zero-inflated models with application to spatial count data. Environmental and Ecological Statistics, 9: 341-355. [Crossref]

[7] Adedotun, A.F., Adesina, O.S., Onasanya, O.K., Onos, E.S., Onuche, O.G. (2022). Count models analysis of factors associated with road accidents in Nigeria. International Journal of Safety and Security Engineering, 12(4): 533-542. [Crossref]

[8] Lu, J., Meyer, S. (2022). A zero-inflated endemic-epidemic model with an application to measles time series in Germany. Biometrical Journal, 65(8): 2100408. [Crossref]

[9] Shi, Y., Dai, W., Long, W. (2021). A new deep learning-based zero-inflated duration model for financial data irregularly spaced in time. Frontiers in Physics, 9: 651528. [Crossref]

[10] Pittman, B., Buta, E., Krishnan-Sarin, S., O’Malley, S.S., Liss, T., Gueorguieva, R. (2020). Models for analyzing zero-inflated and overdispersed count data: An application to cigarette and marijuana use. Nicotine and Tobacco Research, 22(8): 1390-1398. [Crossref]

[11] Adesina, O.S., Agunbiade, D.A., Oguntunde, P.E. (2021). Flexible Bayesian Dirichlet mixtures of generalized linear mixed models for count data. Scientific African, 13: e00963. [Crossref]

[12] Hacker, R.S., Hatemi-J, A. (2022). Model selection in time series analysis: Using information criteria as an alternative to hypothesis testing. Journal of Economic Studies, 49(6), 1055-1075. [Crossref]

[13] Hasan, F.M., Hussein, T.F., Saleem, H.D., Qasim, O.S. (2024). Enhanced unsupervised feature selection method using crow search algorithm and Calinski-Harabasz. International Journal of Computational Methods and Experimental Measurements, 12(2): 185-190. [Crossref]

[14] Oluwadare, J.R., Adesina, O.S., Adedotun, A.F., Odetunmibi, O.A. (2024). Estimation techniques for generalized linear mixed models with binary outcomes: Application in medicine. International Journal of Computational Methods and Experimental Measurements, 12(3): 323-331. [Crossref]

[15] Aknouche, A, Almohaimeed, B.S. Dimitrakopoulos, S. (2022). Forecasting transaction counts with integer-valued GARCH models. Studies in Nonlinear Dynamics & Econometrics, 26(4): 529-539. [Crossref]

[16] Alwan, E.H., Al-Qurabat, A.K.M. (2024). Optimizing program efficiency by predicting loop unroll factors using ensemble learning. International Journal of Computational Methods and Experimental Measurements, 12(3): 281-287. [Crossref]

[17] Zhang, C., Han, J. (2021). Data Mining and Knowledge Discovery. In Urban Informatics, Springer, Singapore. [Crossref]

[18] Liu, M., Zhu, F., Li, J., Sun, C. (2023). A systematic review of INGARCH models for integer-valued time series. Entropy, 25(6): 922. [Crossref]

[19] Liboschik, T., Fokianos, K., Fried, R. (2017). Tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software, 82(5): 1-51. [Crossref]

[20] Christou, V., Fokianos, K. (2014). Quasi-likelihood inference for negative binomial time series models. Journal of Time Series Analysis, 35(1): 55-78. [Crossref]

[21] R Core Team. (2013). R: A language and environment for statistical computing. Foundation for Statistical Computing, Vienna, Austria. https://www.gbif.org/tool/81287/r-a-language-and-environment-for-statistical-computing.

[22] Liboschik, T., Fried, R., Fokianos, K., Probst, P. (2020). tscount: Analysis of count time series. https://cran.r-project.org/web/packages/tscount/index.html.

[23] Malki, Z., Atlam, E.S., Ewis, A., Dagnew, G., Alzighaibi, A.R., ELmarhomy, G., Elhosseini, M.A., Hassanien, A.E., Gad, I. (2021). ARIMA models for predicting the end of COVID-19 pandemic and the risk of second rebound. Neural Computing and Applications, 33: 2929-2948. [Crossref]

[24] Adesina, O.S., Onanaye, S.A., Okewole, D., Egere, A.C. (2020). Forecasting of new cases of Covid-19 in Nigeria using autoregressive fractionally integrated moving average models. Asian Research Journal of Mathematics, 16(9): 135-146. [Crossref]

[25] Chan, S., Chu, J., Zhang, Y., Nadarajah, S. (2021). Count regression models for COVID-19. Physica A: Statistical Mechanics and its Applications, 563: 125460. [Crossref]

[26] Busari, S.I., Samson, T.K. (2022). Modelling and forecasting new cases of Covid-19 in Nigeria: Comparison of regression, ARIMA and machine learning models. Scientific African, 18: e01404. [Crossref]

Cite this:

APA Style

IEEE Style

BibTex Style

MLA Style

Chicago Style

GB-T-7714-2015

Adesina, O. S. & Obokoh, L. O. (2025). Modelling Zero-Inflated Time Series Count Data Using Covid-19 Data. Int. J. Comput. Methods Exp. Meas., 13(2), 273-279. https://doi.org/10.18280/ijcmem.130206

pdf

Citations