Skip to content

Blog

How To Perform EDA

image for eda - realpython

Source: https://realpython.com/polars-python/

Performing EDA on a dataset is very difficult and time taking process because there is many thing you can do while performing EDA on your dataset.

Basics of Statistics for ML

graph TD;
    Inferential_Statistics["Inferential Statistics"]
    Descriptive_Statistics["Descriptive Statistics"]
    Measure_of_Central_Tendency["Measure of Central Tendency"]
    Weighted_Mean["Weighted Mean"]
    Trimmed_Mean["Trimmed Mean"]
    Measure_of_Dispersion["Measure of Dispersion"]
    Standard_Deviation["Standard Deviation"]
    CV["Coefficient of Variation"]
    Five_Number_Summary["5 Number Summary"]
    Box_Plot["Box Plot / Whisker Plot"]

    Statistics --> Descriptive_Statistics
    Statistics --> Inferential_Statistics
    Descriptive_Statistics --> Measure_of_Central_Tendency
    Descriptive_Statistics --> Measure_of_Dispersion
    Measure_of_Central_Tendency --> Mean
    Measure_of_Central_Tendency --> Median
    Measure_of_Central_Tendency --> Mode
    Mean --> Weighted_Mean
    Mean --> Trimmed_Mean
    Measure_of_Dispersion --> UniVariate
    Measure_of_Dispersion --> BiVariate
    UniVariate --> Range
    UniVariate --> Variance
    UniVariate --> Standard_Deviation
    UniVariate --> CV
    UniVariate --> Five_Number_Summary
    Five_Number_Summary --> Percentile
    Five_Number_Summary --> Box_Plot
    BiVariate --> Covariance
    BiVariate --> Correlation

๐Ÿงช Introduction To Hypothesis Testing

๐ŸฅŠ Null V/S Alternative Hypothesis

Parameter Null Hypothesis Alternative Hypothesis
Definition A null hypothesis is a statement in which there is no relation between the two variables. An alternative hypothesis is a statement in which there is some statistical relationship between the two variables.
What is it? Generally, researchers try to reject or disprove it. Researchers try to accept or prove it.
Testing Process Indirect and Implicit Direct and Explicit
p-value Null hypothesis is rejected if the p-value is less than the alpha-value; otherwise, it is accepted. An alternative hypothesis is accepted if the p-value is less than the alpha-value otherwise, it is rejected.
Notation \(H_0\) \(H_1\)
Symbol Used Equality Symbol (=, โ‰ฅ, โ‰ค) Inequality Symbol (โ‰ , <, >)

Learn Web Scraping

web scraping - real python

Source: https://realpython.com/python-web-scraping-practical-introduction

Web scraping is a very essential tool for programmers to learn to gather data from websites. Specifically, for Data Scientists web scraping is goto tool to gather data from websites. We can use bs4.BeautifulSoup or selenium in Python to scrape any website.

You can see some of my projects where I scraped websites like 99acres.com, flipkart.com, housing.com and gather useful data for my Data Science projects like arv-anshul/campusx-real-estate.

I have learned Web Scraping from YouTube only.

๐ŸŒด Tree VS Regression Models

tree-vs-regression-models

Source: www.freecodecamp.org

Tree based models and Regression models are widely used Machine Learning models. So more you know about them is better for you. Also, many concepts from these models are borrowed by advance Machine Learning models like Gradient Boosting, XGBoost, etc.

These models are also great choice for :fontawesome-user-tie: interviewers so from these models they ask many interview questions. This blog mainly focuses on tree based models.