## Package is a collection of modules.
## Library is a collection of Packages.

Hi there. Today, we are going to see how to use SweetViz library in Python which will enable us to perform powerful Exploratory Data Analysis(EDA) on your dataset. So,let's get started.

First, you will have to pip install this package as it is not an in-built Python package. You can do so from the command prompt or using !pip install sweetviz from jupyter notebook environment.

I will be using USA Housing data in this example.

!pip install sweetviz
Collecting sweetviz
  Downloading https://files.pythonhosted.org/packages/8f/bd/f4454adfe1d3bbd04892d6172348ca215fa62d59fb09c1ac6b8a233341d3/sweetviz-1.0a7-py3-none-any.whl (323kB)
Requirement already satisfied: scipy>=1.3.2 in c:\users\sharan babu\anaconda3\lib\site-packages (from sweetviz) (1.4.1)
Collecting tqdm>=4.43.0 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/f3/76/4697ce203a3d42b2ead61127b35e5fcc26bba9a35c03b32a2bd342a4c869/tqdm-4.46.1-py2.py3-none-any.whl (63kB)
Collecting pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/1d/eb/b4f68f54ad287d583c9c3b3c77f865615f832f092810f20d2b44498cd06c/pandas-1.0.4-cp37-cp37m-win_amd64.whl (8.7MB)
Collecting matplotlib>=3.1.3 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/b4/4d/8a2c06cb69935bb762738a8b9d5f8ce2a66be5a1410787839b71e146f000/matplotlib-3.2.1-cp37-cp37m-win_amd64.whl (9.2MB)
Collecting importlib-resources>=1.2.0 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/b6/03/1865fdd49ec9a938f9f84b255d3d37863df9fbd18b48c1c3f761040cbf13/importlib_resources-2.0.0-py2.py3-none-any.whl
Collecting jinja2>=2.11.1 (from sweetviz)
  Downloading https://files.pythonhosted.org/packages/30/9e/f663a2aa66a09d838042ae1a2c5659828bb9b41ea3a6efa20a20fd92b121/Jinja2-2.11.2-py2.py3-none-any.whl (125kB)
Requirement already satisfied: numpy>=1.16.0 in c:\users\sharan babu\anaconda3\lib\site-packages (from sweetviz) (1.16.4)
Requirement already satisfied: pytz>=2017.2 in c:\users\sharan babu\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3->sweetviz) (2019.1)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\sharan babu\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3->sweetviz) (2.8.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\sharan babu\anaconda3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\sharan babu\anaconda3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (2.4.0)
Requirement already satisfied: cycler>=0.10 in c:\users\sharan babu\anaconda3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (0.10.0)
Requirement already satisfied: zipp>=0.4; python_version < "3.8" in c:\users\sharan babu\anaconda3\lib\site-packages (from importlib-resources>=1.2.0->sweetviz) (0.5.1)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in c:\users\sharan babu\anaconda3\lib\site-packages (from importlib-resources>=1.2.0->sweetviz) (0.17)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\sharan babu\anaconda3\lib\site-packages (from jinja2>=2.11.1->sweetviz) (1.1.1)
Requirement already satisfied: six>=1.5 in c:\users\sharan babu\anaconda3\lib\site-packages (from python-dateutil>=2.6.1->pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3->sweetviz) (1.12.0)
Requirement already satisfied: setuptools in c:\users\sharan babu\anaconda3\lib\site-packages (from kiwisolver>=1.0.1->matplotlib>=3.1.3->sweetviz) (41.0.1)
Installing collected packages: tqdm, pandas, matplotlib, importlib-resources, jinja2, sweetviz
  Found existing installation: tqdm 4.32.1
    Uninstalling tqdm-4.32.1:
      Successfully uninstalled tqdm-4.32.1
  Found existing installation: pandas 0.24.2
    Uninstalling pandas-0.24.2:
      Successfully uninstalled pandas-0.24.2
  Found existing installation: matplotlib 3.1.0
    Uninstalling matplotlib-3.1.0:
      Successfully uninstalled matplotlib-3.1.0
  Found existing installation: Jinja2 2.10.1
    Uninstalling Jinja2-2.10.1:
      Successfully uninstalled Jinja2-2.10.1
Successfully installed importlib-resources-2.0.0 jinja2-2.11.2 matplotlib-3.2.1 pandas-1.0.4 sweetviz-1.0a7 tqdm-4.46.1
import numpy as np
import pandas as pd
import sweetviz
df = pd.read_csv(r"C:\Users\Sharan Babu\Desktop\Data science\original\Refactored_Py_DS_ML_Bootcamp-master\11-Linear-Regression\USA_housing.csv")
df.head()
# In this dataset, price column is the target feature or dependent variable.
Avg. Area Income Avg. Area House Age Avg. Area Number of Rooms Avg. Area Number of Bedrooms Area Population Price Address
0 79545.458574 5.682861 7.009188 4.09 23086.800503 1.059034e+06 208 Michael Ferry Apt. 674\nLaurabury, NE 3701...
1 79248.642455 6.002900 6.730821 3.09 40173.072174 1.505891e+06 188 Johnson Views Suite 079\nLake Kathleen, CA...
2 61287.067179 5.865890 8.512727 5.13 36882.159400 1.058988e+06 9127 Elizabeth Stravenue\nDanieltown, WI 06482...
3 63345.240046 7.188236 5.586729 3.26 34310.242831 1.260617e+06 USS Barnett\nFPO AP 44820
4 59982.197226 5.040555 7.839388 4.23 26354.109472 6.309435e+05 USNS Raymond\nFPO AE 09386

Analyzing a DataFrame

analysis = sweetviz.analyze([df,"EDA"], target_feat='Price')
:FEATURES DONE:                    |█████████████████████| [100%]   00:06  -> (00:00 left)
:PAIRWISE DONE:                    |█████████████████████| [100%]   00:00  -> (00:00 left)
Creating Associations graph... DONE!
type(analysis)
sweetviz.dataframe_report.DataframeReport
analysis.show_html('EDA.html')

This is an amazing visualization library for your data as you instantly get various insights into your data which you could have done manually but would have taken a lot more time.
For numerical features, you get point plot, histogram, number of value missing, number of distinct values, quartile values and more useful information like skewness of the column.
For categorical features, along with the number of distinct and missing values, you
Additionally, you also get the the 'Associations' or pair-wise correlations between 2 variables which is helpful for determining feature importance.

You can also use this library to comapre two DataFrames,say, your Training set and Test set and infer some meaning from the comparison.

train = df[:3000]
test = df[3000:]
# Consider 'test' to be the Test data.
# The command to perform EDA comparison is:
analysis = sweetviz.compare([train,"Train"],[test,"Test"], "Price") # Price is the target variable common to both tables
# Now you can view your results.
analysis.show_html('EDA2.html')
:FEATURES DONE:                    |█████████████████████| [100%]   00:08  -> (00:00 left)
:PAIRWISE DONE:                    |█████████████████████| [100%]   00:00  -> (00:00 left)
Creating Associations graph... DONE!

Now, you can see comparison between the Train and Test dataset differentiate by different colors for all paramters discussed above.
Therefore, this is a handy module