Time Series Analysis

Time series forecasting- SARIMA vs Auto ARIMA !

This blog’s link on Analytics-vidhya:
https://medium.com/analytics-vidhya/time-series-forecasting-sarima-vs-auto-arima-models-f95e76d71d8f?source=friends_link&sk=610c9f14fbdffddafc151fb7ec3723a0

Posted by OMEGA MARKOS on April 27, 2020

The Explanatory Data Analysis steps

Explanatory Data Analysis is a very important part of data science. It is a process in which you develop a deeper understanding of the data set by performing initial investigations to discover patterns, spot anomalies, and check assumptions. It is a good practice to understand the data first and try to gather insight from it before you start to develop a model. It can also help to start formalizing the right questions for the data analysis. I approached the explanatory data analysis process in the following steps:

Importing important libraries and read data
Data Cleaning
Create tables and plots
Explore data correlation
Identification of important features
1.Importing important libraries and read data
At this step, I imported the major libraries and read the data set. Using the .info() method I was able to learn that the dataset has 21597 houses with 21 different variables for the size, date, location, condition & features of the houses. This method also shows the column names and their corresponding data type.
2.Data Cleaning
Missing values:
Running the .isna() method I learned that the data has substantial missing values in the ‘waterfront’ & ‘renovated’ columns and a few missing values in the “view’ column. From my experience as a coordinator for a very challenging data collection process, I am a strong believer of “Do your best to keep your data!” Throwing data is both costly & will affect the accuracy of the data analysis especially when you have small data. I also believe that it is very important to know the right imputation method to impute the missing values. Otherwise, the wrong values will affect the result as well. I replaced the missing values using the mode & median of the columns which are all 0.
Extraneous values:

Posted by OMEGA MARKOS on April 27, 2020