Time series forecasting- SARIMA vs Auto ARIMA !

Posted by OMEGA MARKOS on April 27, 2020

This blog’s link on Analytics-vidhya:
https://medium.com/analytics-vidhya/time-series-forecasting-sarima-vs-auto-arima-models-f95e76d71d8f?source=friends_link&sk=610c9f14fbdffddafc151fb7ec3723a0

Time series is a series of data points measured at consistent time intervals such as yearly, daily, monthly, hourly and so on. It is time-dependent & the progress of time is an important aspect of the data set. One of the most common methods used in time series forecasting is known as the ARIMA model, which stands for Auto Regressive Integrated Moving Average. ARIMA is a model that can be fitted to time series data to predict future points in the series.
We can split the Arima term into three terms, AR, I, MA:

AR(p) stands for the autoregressive model, the p parameter is an integer that confirms how many lagged series are going to be used to forecast periods ahead.

I(d) is the differencing part, the d parameter tells how many differencing orders are going to be used to make the series stationary.

MA(q) stands for moving average model, the q is the number of lagged forecast error terms in the prediction equation. SARIMA is seasonal ARIMA and it is used with time series with seasonality.

There are a few steps to implement an ARIMA model:

** Import the necessary libraries & Load the data:** The first step for model building is to load the data set & import libraries.


We will be working on Zillow median house data for a specific zip code.

2. Data Preprocessing: While working with time series data in Python, it’s important to always ensure that dates are used as index values and are understood by Python as a true “date” object. We can do this by using pandas datestamp or to_datetime method.

3. Check for stationarity: Most time series models require the data to be stationary. A time series is said to be stationary if its statistical properties such as mean, variance & covariance remain constant over time. The formal ways to check for this are plotting the data and do a visual analysis and use a statistical test.

Visual: we can use the decomposition method which allows us to separately view seasonality (which could be daily, weekly, annual, etc), trend and random which is the variability in the data set after removing the effects of the seasonality and trend.


The plot shows that the data has both trend & seasonality. That means it is not stationary.
Statistical test: To confirm our visual observation on the above plot, we will use the Dickey-Fuller Hypothesis testing.
Null Hypothesis: The series is not stationary.
Alternate Hypothesis: The series is stationary.

With the p-value 1 which is greater than 0.05, we fail to reject the null hypothesis & it confirms that the series is not stationary.
4. Make series stationery & determine the d value:
After the statistical test confirmed that the series is not stationary, the next step is to remove the trend and make the series stationary. One of the most common methods of dealing with removing both the trend and seasonality is differencing and the number of times the differencing was performed to make the series stationary is the d value.

After a couple of differencing the test confirms that the data is stationary. That means the d value is 2.
5. Create ACF and PACF plots & determine the p and q values:
The Partial Autocorrelation Function ( PACF) gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags. The Autocorrelation Function gives the correlation of a time series with its own lagged values but without controlling the other lags.
The ACF plot for the AR(p) time series would be strong to a lag of p and remain stagnant for subsequent lag values, trailing off at some point as the effect is weakened. The PACF, on the other hand, describes the direct relationship between an observation and its lag. This generally leads to no correlation for lag values beyond p.
The ACF for the MA(q) process would show a strong correlation with recent values up to the lag of q, then an immediate decline to minimal or no correlation. For the PACF, the plot shows a strong relationship to the lag and then a tailing off to no correlation from the lag onwards. Below is the ACF & PACFplot for our stationary data.


PACF & ACF suggested that AR(2) & MA(2), the next step is to run the ARIMA model using the range of values estimated by the ACF & PACF. Information criterion like AIC (Akaike Information Criterion) or BIC(Bayesian Information Criterion) will be used to choose among correctly fitted models.
6. Fit ARIMA model:

Auto ARIMA model:
The advantage of using Auto ARIMA over the ARIMA model is that after data preprocessing step we can skip the next steps & directly fit our model. It uses the AIC (Akaike Information Criterion) & BIC(Bayesian Information Criterion) values generated by trying different combinations of p,q & d values to fit the model. The residual plots for the auto ARIMA model look pretty good.
Histogram plus estimated density plot:
The red KDE line follows closely with the N(0,1) line. This is a good indication that the residuals are normally distributed.
The Q-Q-plot:
Shows that the ordered distribution of residuals (blue dots) follows the linear trend of the samples taken from a standard normal distribution with N(0, 1). This is an indication that the residuals are normally distributed.
The standardize residual plot:
The residuals over time don’t display any obvious seasonality and appear to be white noise.
The Correlogram plot:
Shows that the time series residuals have low correlation with lagged versions of itself. Our model is not perfect yet & It needs a few more tweaks. Here are the entire options for our auto Arima model.

To choose between different fitted models, we compute error metrics such as Mean Absolute Error, Mean Squared Error and Median Absolute Error & compare between models. Thanks for reading!
References:
https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/
https://www.analyticsvidhya.com/blog/2018/08/auto-arima-time-series-modeling-python-r/
https://datafai.com/auto-arima-using-pyramid-arima-python-package/