The Explanatory Data Analysis steps
Explanatory Data Analysis is a very important part of data science. It is a process in which you develop a deeper understanding of the data set by performing initial investigations to discover patterns, spot anomalies, and check assumptions. It is a good practice to understand the data first and try to gather insight from it before you start to develop a model. It can also help to start formalizing the right questions for the data analysis. I approached the explanatory data analysis process in the following steps:
- Importing important libraries and read data
- Data Cleaning
- Create tables and plots
- Explore data correlation
- Identification of important features
1.Importing important libraries and read data
At this step, I imported the major libraries and read the data set. Using the .info() method I was able to learn that the dataset has 21597 houses with 21 different variables for the size, date, location, condition & features of the houses. This method also shows the column names and their corresponding data type.
2.Data Cleaning
Missing values:
Running the .isna() method I learned that the data has substantial missing values in the ‘waterfront’ & ‘renovated’ columns and a few missing values in the “view’ column. From my experience as a coordinator for a very challenging data collection process, I am a strong believer of “Do your best to keep your data!” Throwing data is both costly & will affect the accuracy of the data analysis especially when you have small data. I also believe that it is very important to know the right imputation method to impute the missing values. Otherwise, the wrong values will affect the result as well. I replaced the missing values using the mode & median of the columns which are all 0.
Extraneous values:
Posted by OMEGA MARKOS on April 27, 2020