Time Series Analysis


ARIMA vs Auto ARIMA

Time series forecasting- SARIMA vs Auto ARIMA !

This blog’s link on Analytics-vidhya:
https://medium.com/analytics-vidhya/time-series-forecasting-sarima-vs-auto-arima-models-f95e76d71d8f?source=friends_link&sk=610c9f14fbdffddafc151fb7ec3723a0


The Explanatory Data Analysis steps

Explanatory Data Analysis is a very important part of data science. It is a process in which you develop a deeper understanding of the data set by performing initial investigations to discover patterns, spot anomalies, and check assumptions. It is a good practice to understand the data first and try to gather insight from it before you start to develop a model. It can also help to start formalizing the right questions for the data analysis. I approached the explanatory data analysis process in the following steps:

  1. Importing important libraries and read data
  2. Data Cleaning
  3. Create tables and plots
  4. Explore data correlation
  5. Identification of important features
    1.Importing important libraries and read data
    At this step, I imported the major libraries and read the data set. Using the .info() method I was able to learn that the dataset has 21597 houses with 21 different variables for the size, date, location, condition & features of the houses. This method also shows the column names and their corresponding data type.
    2.Data Cleaning
    Missing values:
    Running the .isna() method I learned that the data has substantial missing values in the ‘waterfront’ & ‘renovated’ columns and a few missing values in the “view’ column. From my experience as a coordinator for a very challenging data collection process, I am a strong believer of “Do your best to keep your data!” Throwing data is both costly & will affect the accuracy of the data analysis especially when you have small data. I also believe that it is very important to know the right imputation method to impute the missing values. Otherwise, the wrong values will affect the result as well. I replaced the missing values using the mode & median of the columns which are all 0.
    Extraneous values: