How To Perform EDA
Performing EDA on a dataset is very difficult and time taking process because there is many thing you can do while performing EDA on your dataset.
Performing EDA on a dataset is very difficult and time taking process because there is many thing you can do while performing EDA on your dataset.
graph TD;
Inferential_Statistics["Inferential Statistics"]
Descriptive_Statistics["Descriptive Statistics"]
Measure_of_Central_Tendency["Measure of Central Tendency"]
Weighted_Mean["Weighted Mean"]
Trimmed_Mean["Trimmed Mean"]
Measure_of_Dispersion["Measure of Dispersion"]
Standard_Deviation["Standard Deviation"]
CV["Coefficient of Variation"]
Five_Number_Summary["5 Number Summary"]
Box_Plot["Box Plot / Whisker Plot"]
Statistics --> Descriptive_Statistics
Statistics --> Inferential_Statistics
Descriptive_Statistics --> Measure_of_Central_Tendency
Descriptive_Statistics --> Measure_of_Dispersion
Measure_of_Central_Tendency --> Mean
Measure_of_Central_Tendency --> Median
Measure_of_Central_Tendency --> Mode
Mean --> Weighted_Mean
Mean --> Trimmed_Mean
Measure_of_Dispersion --> UniVariate
Measure_of_Dispersion --> BiVariate
UniVariate --> Range
UniVariate --> Variance
UniVariate --> Standard_Deviation
UniVariate --> CV
UniVariate --> Five_Number_Summary
Five_Number_Summary --> Percentile
Five_Number_Summary --> Box_Plot
BiVariate --> Covariance
BiVariate --> Correlation
Parameter | Null Hypothesis | Alternative Hypothesis |
---|---|---|
Definition | A null hypothesis is a statement in which there is no relation between the two variables. | An alternative hypothesis is a statement in which there is some statistical relationship between the two variables. |
What is it? | Generally, researchers try to reject or disprove it. | Researchers try to accept or prove it. |
Testing Process | Indirect and Implicit | Direct and Explicit |
p-value | Null hypothesis is rejected if the p-value is less than the alpha-value; otherwise, it is accepted. | An alternative hypothesis is accepted if the p-value is less than the alpha-value otherwise, it is rejected. |
Notation | \(H_0\) | \(H_1\) |
Symbol | Used Equality Symbol (=, โฅ, โค) | Inequality Symbol (โ , <, >) |
Web scraping is a very essential tool for programmers to learn to gather data from websites. Specifically, for Data Scientists web scraping is goto tool to gather data from websites. We can use bs4.BeautifulSoup
or selenium
in Python to scrape any website.
You can see some of my projects where I scraped websites like 99acres.com, flipkart.com, housing.com and gather useful data for my Data Science projects like arv-anshul/campusx-real-estate.
I have learned Web Scraping from YouTube only.
Tree based models and Regression models are widely used Machine Learning models. So more you know about them is better for you. Also, many concepts from these models are borrowed by advance Machine Learning models like Gradient Boosting, XGBoost, etc.
These models are also great choice for :fontawesome-user-tie: interviewers so from these models they ask many interview questions. This blog mainly focuses on tree based models.