Time series counts data are discrete time series which include zero and positive integers. There are dedicated models suitable for fitting a specific type of data, such as the Poisson regression, Negative Binomial, and other discrete distributions in the exponential class of family because of certain characteristics such as link function. In the same light, models such as Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA) and extensions are suitable in fitting time series data. Time series count data are generated daily in various fields but not always treated as it should, example of such is daily admittance of patience or discharge in the health facilities [1], another is daily new cases of Covid-19 infected cases, to mention but a few. The traditional methods such as Ordinary Least Square (OLS) usually break down when used in fitting time series or count data. The OLS faces problems such as heteroskedasticity thereby causing overfitting when used to fit count data or time series data [2].
There are numerous alternative models that have been developed to address the problem of heteroscedasticity or overfitting. These models cover a wide range of statistical techniques, such as Discrete Weibull distributions, Dirichlet mixture models and COM-Poisson models [3]. The main goal is to offer robust methods for modelling count data that exhibit under- or over-dispersion. However, while these models have proven useful in a variety of contexts, their utility may not extend seamlessly to the domain of time series count data. The domain of count data analysis presents its own set of unique challenges and complexities, frequently necessitating the development of distinct modelling strategies and methodologies.
Addressing the complexities of zero-inflated count data presents an immense challenge in the realm of statistical analysis and time series modelling [4, 5]. These data, which are frequently distinguished by an excess of zero values and a non-standard distribution, are encountered in a variety of fields, ranging from epidemiology and finance to ecology and social sciences [6, 7]. Understanding and effectively modeling zero-inflated time series count data is a practical necessity, as it is used in predicting disease outbreaks [8], analyzing financial anomalies [9], and studying population dynamics [10], among many other domains.
Zero-inflation occurs in data when two distinct processes govern the observed counts: one that generates zeros more frequently than a standard distribution would predict, and another that generates non-zero counts [5]. These complexities necessitate the use of specialized modelling techniques to account for excess zeros, temporal dependencies, and other time series data-specific factors [11]. In this context, choosing an appropriate model is critical because it directly affects prediction accuracy and inference validity [12-14].
The critical task of comparing different models for fitting zero-inflated time series count data is considered in this study. The study objectives lie in modelling time series count data using the appropriate model which is often ignored and demonstrate its robustness and adequacy in fitting count data by using various model selection criteria and the scoring function. This study intends to provide useful insights for scholars, as well as practitioners dealing with zero-inflated time series count data by examining various methodologies and shedding light on the best practices for modelling.
This study evaluates several models, including ARIMA, zero-inflated binomial, zero-inflated Poisson, and Integrated generalized autoregressive conditional heteroscedasticity (INGARCH), and assesses their applicability to stock modelling based on based on distributions such as Poisson, the linear and quadratic negative binomial, the double Poisson and the generalized Poisson [15]. The selected models underwent a thorough assessment, considering their capacity to represent the data's zero-inflated nature, temporal dependencies, and produce precise forecasts.
The findings of this comparative study will help us better understand how to model zero-inflated time series count data and will also assist researchers in finding the best model for their particular use cases [16]. Navigating the wide range of modelling techniques available is crucial because the choice depends on the calibre and dependability of the insights obtained from the data [17]. This exploration of zero-inflated time series count data is expected to be enlightening, offering innovative perspectives on the opportunities within the field [18]. The remaining sections of the paper include the methodology in Section 2, the results in Section 3, Section 4 the discussion, and finally Section 5, the summary and conclusion.