DSMP by CampusX¶
Extra Sessions¶
Interview Questions on Statistics¶
Description
Feedback Form - https://forms.gle/5vGSrcVs1LrKd8Mo9
Model Explainability¶
Description
Feedback Form - https://forms.gle/14bjmVMSL1GUFPXy5
Interview Questions on Regression¶
Description
Feedback Form - https://forms.gle/GFbzkKyStkFPn6t9A
How to Solve a Banking Problem using ML¶
Description
Feedback Form - https://forms.gle/2E94oLoMsWwrysvB8
Session on ResNET Paper Discussion¶
Description
Feedback Form - https://forms.gle/7p2wCSjoC7UDRox99
Introduction to PyTorch¶
Description
Feedback Form - https://forms.gle/SeGGzLa1pUZWvTvP9
Named Entity Recognition using NLTK & Spacy¶
Description
Feedback Form - https://forms.gle/M1KszATvuEgXWJW57
Latent Dirichlet Allocation (LDA)¶
Description
Feedback Form - https://forms.gle/EQk2F3mg1Hgqjgyx7
Introduction to PowerBI¶
Description
Feedback Form - https://forms.gle/8YMKA71gS9bos6Ns8
Anomaly Detection¶
Description
Feedback Form - https://forms.gle/2Ck2kFvSWqRptiaR6
Prompt Engineering¶
Description
Interview Questions on Tree Based Models¶
Description
Feedback Form - https://forms.gle/naq7hYB4wr9B1jCq8
Multioutput and Multiclass Classification Problem¶
Description
Feedback Form - https://forms.gle/UPQVWEQ7vzWPh3PP8
EKYC Using Computer Vision¶
Description
Feedback Form - https://forms.gle/GAyVY3hqZK4eLK3D7
Time Series Forecasting¶
Description
Feedback Form - https://forms.gle/cKyz2gLPLDNsFRv49
A/B Testing¶
Description
Feedback Form - https://forms.gle/qXj3NkuJGMVDvEY86
Langchain¶
Description
Feedback Form - https://forms.gle/mtLS482RCXEfGu1R6
FastAPI¶
Description
Feedback Form - https://forms.gle/DP2xVdUxscsAcoPm8
Vertex AI¶
Description
Feedback - https://forms.gle/dvWxrHCHNccbHi6p9
RAG¶
Description
Feedback - https://forms.gle/NdHpDDgw9beYZm786
MLOps Revisited¶
Session 1 MLOps Revisited - Introduction to MLOps¶
Description
Session 2 on MLOps Revisited - MLOps Tools Stack¶
Description
Interview Questions¶
Session 1 on Interview Questions on Statistics¶
Description
Please give feedback
Session on Project Based Interview Questions¶
Description
Feedback - https://forms.gle/Z6eJ2KGW8PD6m3kq6
Session 1 on ML Interview Questions¶
Description
Feedback - https://forms.gle/BuHb3bKNuiDEKq54A
Recording - Session 3 on ML Interview Questions¶
Description
Feedback Form - https://forms.gle/grsYYRzmG8YasmCQ7
Miscellaneous Topics¶
Session 1 on Imbalanced Data - Introduction¶
Description
Session 2 on Imbalanced Data - Oversampling Techniques¶
Description
Session 3 on Imbalanced Data - Undersampling Techniques¶
Description
Other Boosting Frameworks¶
Session 1 on Introduction to LightGBM¶
Description
Session 2 on LightGBM (GOSS & EFB)¶
Description
Session 1 on CatBoost - Practical Introduction¶
Description
Advanced XGBoost¶
Session on XGBoost Regularization¶
Description
Session 2 on XGBoost Regularization¶
Description
Session on XGBoost Optimizations¶
Description
How XGBoost Handles Missing Values¶
Description
Feature Engineering¶
Session 1 on Encoding Categorical Features¶
Description
https://colab.research.google.com/drive/1PIZQpOTZgXMpQUQ0SUT51RpfvH-jVSFi?usp=sharing
Dataset 1 - https://drive.google.com/file/d/1B0YNqPgjTat67SAc5nIpeNSDWRJQRz9e/view?usp=sharing
Dataset 2 - https://drive.google.com/file/d/1a9kmZni3NJqEP2-7v4oHbPMr9UgoUnnQ/view?usp=sharing
Notes - https://drive.google.com/file/d/1yueCF-CJU7p8lag9GTAumiuVE4pDOXXu/view?usp=sharing
Session on Sklearn ColumnTransformer & Pipeline¶
Description
Session on Sklearn Deep Dive¶
Description
Session 2 on Encoding Categorical Features¶
Description
Session 1 on Discretization¶
Description
Session 2 on Discretization¶
Description
Session 1 on Handling Missing Data¶
Description
Session 2 on Handling Missing Data¶
Description
Session 3 on Handling Missing Values¶
Description
Session on Feature Scaling¶
Description
Session 2 on Feature Scaling¶
Description
Session 1 on Outlier Detection¶
Description
Session 2 on Outlier Detection¶
Description
Session 3 on Outlier Detection¶
Description
Session on Feature Transformation¶
Unsupervised Machine Learning¶
Session on DBSCAN¶
Description
Session on Hierarchical Clustering¶
Description
Session on Gaussian Mixture Models¶
Description
Session 2 on Gaussian Mixture Models¶
Description
Session on T-SNE¶
Description
Code - https://colab.research.google.com/drive/1N2kGH2U73JkMbD_OPp4H0QgHE066GlwG?usp=sharing
Blog 1 - https://distill.pub/2016/misread-tsne/
Blog 2 - https://colah.github.io/posts/2014-10-Visualizing-MNIST/
Notes - https://drive.google.com/file/d/1FqmADKeMxbrbH2oPGW-m9wZ5fcfkoPWF/view?usp=sharing
Research Paper - https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
Session 2 on T-SNE¶
Description
KMeans Clustering¶
Session 1 on K Means Clustering¶
Description
Session 2 on KMeans Clustering¶
Description
Code - https://colab.research.google.com/drive/18CynErsHaQ_BanYv0ruq2mFF9Hncmu2V?usp=sharing
Assignment - https://www.kaggle.com/code/campusx/ipl-kmeans-clustering
Notes - https://drive.google.com/file/d/11rBoavT2eGWzElwEkNrgh4PpoZ2MP-mF/view?usp=sharing
Research Papers
https://www.cse.iitd.ac.in/~rjaiswal/2015/col870/Project/Nipun.pdf
https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf
Session 3 on KMeans Clustering¶
Description
Code - https://colab.research.google.com/drive/1j5fLdvQU5-phpm8L5Su6ydGrXbeUMG8a?usp=sharing
Task Dataset - https://www.kaggle.com/datasets/elemento/nyc-yellow-taxi-trip-data?select=yellow_tripdata_2015-01.csv
Research Paper - https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b452a856a3e3d4d37b1de837996aa6813bedfdcf
Notes - https://drive.google.com/file/d/1sUtu0DOa9DEIuIA41gRarE5Qb4utXyWm/view?usp=sharing
K-Means Clustering Algorithm From Scratch In Python¶
Description
MiniBatch KMeans Task Solution¶
Description
MLOps¶
Session 1 on MLOPs - Introduction to MLOps¶
Description
Session 2 on MLOps - Version Control¶
Description
Session 3 on MLOps - Reproducibility¶
Description
Session 4 on MLOps - Data Version Control (DVC)¶
Description
Session 5 on MLOps - ML Pipelines and Experimentation Tracking¶
Description
Session 6 on MLOps¶
Description
Session 7 on MLOps | Continuous Integration¶
Description
Session 8 on MLOps - Dockers¶
Description
Session 9 on MLOPs - Continuous Deployment¶
Description
Session 10 on MLOps - Introduction to AWS¶
Description
Session 12 on MLOps - Distributed Infrastructure¶
Description
Session 13 on MLOps - Kubernetes Internals¶
Description
Session 14 on MLOps - Deployment on Kubernetes¶
Description
Session 15 on MLOps - Seldon Deployments¶
Description
Session 16 on MLOps - Monitoring & Alerting¶
Description
Session 17 on Rollout & Rollback Strategies¶
Description
Session on MLOps Interview Questions¶
Description
Session 18 on MLOps - ML Technical Debt¶
Description
XGBoost¶
Introduction to XGBoost | XGBoost Part 1¶
Description
XGBoost for Regression | XGBoost Part 2¶
Description
XGBoost For Classification | XGBoost Part 3¶
Description
The Complete Maths of XGBoost | XGBoost Part 3¶
Description
Capstone Project¶
Session 1 on Capstone Project | Data Gathering¶
Description
Datasets
https://docs.google.com/spreadsheets/d/1mFNBKFgwFnCXvRsLps5FbsPt_WNnFYyOvBGwSFHZHRU/edit?usp=sharing
https://docs.google.com/spreadsheets/d/19Uw-4uktVEQKFzVHTRkd0DMJ4v3lQJsiCH63PlHwQdw/edit?usp=sharing
https://docs.google.com/spreadsheets/d/1z55UOBr3nfFYf5JXkCAcTGameKrOm2a0irRfjKrpdSs/edit?usp=sharing
https://docs.google.com/spreadsheets/d/1FzCcUbzBKG78snWFg3tAAD4E1sIjCClD9cfsA2DWAjg/edit?usp=sharing
Web Scraping Codes
Flats/Appartments : - https://colab.research.google.com/drive/1bKT92iRVecazQcc3eJmpZH7HZyCLk-oO?usp=sharing
https://colab.research.google.com/drive/1IclV7RVZSVNe3fo5WapspW6uo9uWLTU5?usp=sharing
https://colab.research.google.com/drive/1cmJ9xbSvErNXnfVP0xVBpcvf2hRz3fmp?usp=sharing
Notebook PDF : https://drive.google.com/file/d/179HLl-HQVoAFUKcGUtUujW72T1QJpcCJ/view?usp=sharing
Session 2 on Capstone Project | Data Cleaning¶
Description
Session 3 on Capstone Project | Feature Engineering¶
Description
Session 4 on Capstone Project | EDA¶
Description
Session 5 on Capstone Project | Outlier Detection and Removal¶
Description
Code - https://github.com/campusx-official/dsmp-capstone-project
Notebook PDF Session 4 Onwards: https://drive.google.com/file/d/1PS-M1pWgfU_wMKDJNQq1iNNSBb_iRsuG/view?usp=sharing
Session 6 on Capstone Project | Missing Value Imputation¶
Description
Code - https://github.com/campusx-official/dsmp-capstone-project
Notebook PDF Session 4 Onwards: https://drive.google.com/file/d/1PS-M1pWgfU_wMKDJNQq1iNNSBb_iRsuG/view?usp=sharing
Session 7 on Capstone Project | Feature Selection¶
Description
Code - https://github.com/campusx-official/dsmp-capstone-project
Notebook PDF Session 4 Onwards: https://drive.google.com/file/d/1PS-M1pWgfU_wMKDJNQq1iNNSBb_iRsuG/view?usp=sharing
Session 8 on Capstone Project | Model Selection & Productionalization¶
Description
https://github.com/campusx-official/dsmp-capstone-project
Website Code - https://github.com/campusx-official/real-estate-app
Notebook PDF Session 4 Onwards: https://drive.google.com/file/d/1PS-M1pWgfU_wMKDJNQq1iNNSBb_iRsuG/view?usp=sharing
Session 9 on Capstone Project | Building the Analytics Module¶
Description
Session 10 on Capstone Project | Building the Recommender System¶
Description
Code - https://github.com/campusx-official/dsmp-capstone-project
Apartment Data : https://colab.research.google.com/drive/1Ms-86hbsFojEG_0lXdeI5wiFzm5T0xUM?usp=sharing
Notebook PDF Session 4 Onwards: https://drive.google.com/file/d/1PS-M1pWgfU_wMKDJNQq1iNNSBb_iRsuG/view?usp=sharing
Session 11 on Capstone Project | Building the Recommender System Part 2¶
Description
Session 12 on Capstone Project | Building the Insights Module¶
Description
Session 13 on Capstone Project | Deploying the application on AWS¶
Description
Week 36 - Gradient Boosting¶
Session 1 on Gradient Boosting for Regression¶
Description
Session 2 on Gradient Boosting | Perspectives¶
Description
Gradient Boosting for Classification Part 1¶
Description
Gradient Boosting for Classification | Geometric Intuition¶
Description
Gradient Boosting Classification | Maths Formulation¶
Description
Week 35 - Random Forest¶
Bagging | Introduction | Part 1¶
Bagging Ensemble | Part 2 | Bagging Classifiers¶
Bagging Ensemble | Part 3 | Bagging Regressor¶
Session 1 on Random Forest¶
Description
Session 2 on Random Forest¶
Description
Week 34 - Decision Trees¶
Session 1 on Decision Trees¶
Description
Session 2 on Decision Trees¶
Description
Session 3 on Decision Trees | Pruning¶
Description
Awesome Decision Tree Visualization using dtreeviz library¶
Description
Week 33 - Support Vector Machines (SVM)¶
SVM Part 1 - Hard Margin SVM¶
Description
SVM Part 2 | Soft Margin SVM¶
Description
Session on Constrained Optimization Problem¶
Description
Session on SVM Dual Problem¶
Description
Session on Maths Behind SVM Kernels¶
Description
Week 32 - Logistic Regression¶
Session 1 on Logistic Regression¶
Description
Session on Multiclass Classification using Logistic Regression¶
Description
Session on Maximum Likelihood Estimation¶
Description
Session 3 on Logistic Regression¶
Description
Code - https://colab.research.google.com/drive/14yneTfvrXQLC_drOPCme1lWKMnxLGN5d?usp=sharing
https://colab.research.google.com/github/campusx-official/100-days-of-machine-learning/blob/main/day60-logistic-regression-contd/polynomial-logistic-regression.ipynb
Notebook PDF : https://drive.google.com/file/d/1A4KTY5onEMazPXLnRxQiJOMBnyfZoGj6/view?usp=sharing
Logistic Regression Hyperparameters¶
Description
Code - https://github.com/campusx-official/100-days-of-machine-learning/tree/main/day60-logistic-regression-contd
This link contains all notebook PDFs used in 100 Days of ML Playlist.
https://lnkd.in/gxv947TB
Week 31 - Naive Bayes¶
Crash Course on Probability Part 1¶
Description
Crash Course on Probability Part 2¶
Description
Code - https://colab.research.google.com/drive/1q0yJ-6pTLkXyETwS41uFf5ggOagxydvt?usp=sharing
Notebook PDF Probability part - 2: https://drive.google.com/file/d/1ygZT5izsUcp8AsmmqDjj8CcP_5WpKnFP/view?usp=sharing
Session 1 on Naive Bayes¶
Description
Session 2 on Naive Bayes¶
Description
Session 3 on Naive Bayes¶
Description
Email Spam Classifier | End to End Project¶
Description
Week 30 - Model Evaluation and Selection¶
ROC Curve in Machine Learning¶
Description
Session on Cross Validation¶
Description
Session on Data Leakage¶
Description
Session on Hyperparameter Tuning¶
Description
Week 29 - PCA¶
PCA Part 3 | Code Example and Visualization¶
Description
Session on Eigen Vectors and Eigen Values¶
Description
Session on Eigen Decomposition + PCA Variants¶
Description
Session on Singular Value Decomposition¶
Description
Week 28 - K Nearest Neighbors¶
Session 1 on K-Nearest Neighbors¶
Description
Coding K Nearest Neighbors from Scratch¶
Description
How to draw Decision Boundary for classification algorithms¶
Description
Session on Advanced KNN¶
Description
Classification Metrics Part 1 | Accuracy and Confusion Matrix | Type 1 and Type 2 Errors¶
Description
Code - https://github.com/campusx-official/100-days-of-machine-learning/tree/main/day59-classification-metrics
Accuracy is not reliable measure in case of imbalanced data : https://colab.research.google.com/drive/12hFmzcyfQetk5MUlTMNrrl1VCoYDGNfT?usp=sharing
Classification Metrics Part 2 | Precision, Recall and F1 Score¶
Description
Week 27 - Regularization¶
Regularization Part 1 | Bias Variance Trade-off¶
Description
Regularization Part 2 | What is Regularization | Paid Zoom Session | 19th May¶
Description
Ridge Regression Part 1 | Geometric Intuition and Code | Regularized Linear Models¶
Description
Ridge Regression Part 2 | Mathematical Formulation & Code from scratch | Regularized Linear Models¶
Description
Ridge Regression Part 3 | Gradient Descent | Regularized Linear Models¶
Description
Ridge Regression Part 4 | 5 Key Points | Regularized Linear Models¶
Description
Lasso Regression | Intuition and Code Sample | Regularized Linear Models¶
Description
Why Lasso Regression creates sparsity?¶
Description
Code -
ElasticNet Regression | Intuition and Code Example | Regularized Linear Models¶
Description
Week 26 - Feature Selection¶
Session 54 - Feature Selection Part 1 | Filter Methods¶
Description
Session 55 - Feature Selection Part 2 | Wrapper Methods¶
Description
Session 3 on Feature Selection | Embedded Methods¶
Description
Week 25 - Regression Analysis¶
Session 1 on Regression Analysis¶
Description
Session 2 on Regression Analysis¶
Description
Polynomial Regression¶
Description
Session on Assumptions of Linear Regression¶
Description
Session 53 - Session on Multicollinearity¶
Description
Week 24 - Gradient Descent¶
Session 51 - Gradient Descent From Scratch¶
Description
Session 52 (Part 1) - Batch Gradient Descent¶
Description
Session 52 (Part 2) - Stochastic Gradient Descent¶
Description
Session 52 (Part 3) - Mini-Batch Gradient Descent¶
Description
Week 23 - Linear Regression¶
Session 48 - Introduction to Machine Learning¶
Session 49 - Simple Linear Regression¶
Description
Session 50 - Multiple Linear Regression¶
Description
Session on Optimization The Big Picture¶
Description
Session on Differential Calculus¶
Description
Week 22 - Linear Algebra¶
Linear Algebra - Part 1 | Vectors¶
Description
Linear Algebra Part 2 | Matrices (Computation)¶
Description
Linear Algebra Part 3 | Matrices (Intuition)¶
Description
Tools
https://campusx-official-matrix-linear-transformation-viz-linear-x7jwva.streamlit.app/
https://campusx-official-matrix-linear-transformation-v-multiply-5a32p1.streamlit.app/
https://campusx-official-matrix-linear-transformatio-determinant-96ncsg.streamlit.app/
PDF: https://drive.google.com/file/d/1XCIpJ-vPuuMniSipujE4GNXXOuVK1W6y/view?usp=share_link
Week 21 - Hypothesis Testing¶
Session 45 - Hypothesis Testing Part 1¶
Description
Session 45 PDF Notebook: https://drive.google.com/file/d/1J6TWERqWu1-98n2b8uBKdU8j0aCVgyuN/view?usp=share_link
Session 46 - Hypothesis Testing Part 2 | p-values | t-tests¶
Description
Code - https://colab.research.google.com/drive/1W2ts8cTUwnAQL47QHZI3iWoE6KJz4bHf?usp=sharing
https://www.kaggle.com/campusx/titanic-single-sample-t-test
https://www.kaggle.com/campusx/titanic-2-sample-t-test
Notebook PDF Link: https://drive.google.com/file/d/17rN645-blGEO59vA6Jkvieh-IbFeOP8t/view?usp=share_link
Session on Chi Square Tests¶
Description
Code - https://colab.research.google.com/drive/113wsZhFcUDa-QnOeY3KmIBsJToyzE80e?usp=sharing
PDF - https://drive.google.com/file/d/1nmxMlse95CE0it612u55s9hudL2Xef4K/view?usp=share_link
@TimeStamp 02:01:00 :
Formlua for Chi-Square is : (Observed - Expected)^2 / Expected, not (Observed - Expected)^2 / Observed
Calculation would go like :
(15-12)^2 / 12 + (20-19)^2 / 19 + ... + (40 - 12)^2 / 12
Session on ANOVA¶
Description
Week 20 - Inferential Statistics¶
Session 43 - Central Limit Theorem¶
Description
Session 44 - Confidence Intervals¶
Description
Session 44 Notebook PDF : https://drive.google.com/file/d/1nskWHtR1ePmrje76k71gdUc2-fcVWvMH/view?usp=share_link
Week 19 - Probability Distributions¶
Session 41 - Normal Distribution¶
Description
Code - https://colab.research.google.com/drive/1N_T0_w5vpT1k1Z4pSf4IMhAxYT1nRKLU?usp=sharing
Viz tool - https://samp-suman-normal-dist-visualize-app-lkntug.streamlit.app/
Z-table - https://www.ztable.net/
Notebook Pdf : https://drive.google.com/file/d/11V7c5D80UDteolb_DVgS0em72pfMB-i0/view?usp=share_link
Session 42 - Non-Gaussian Probability Distributions¶
Description
Code - https://colab.research.google.com/drive/1Q2ug8BXogFqYY_6e04dmHk0Tn5_yfObo?usp=sharing https://github.com/campusx-official/100-days-of-machine-learning/tree/main/day30-function-transformer https://github.com/campusx-official/100-days-of-machine-learning/tree/main/day31-power-transformer
Session 42 Notebook PDF: https://drive.google.com/file/d/1sd4nz8PNsGc334ng86V8uKvJNqhWyRZ2/view?usp=share_link
Session on Views and User Defined Functions in SQL¶
Description
Session Notebook PDF (View and User Defined Function): https://drive.google.com/file/d/1NxvHiK-NJBIAzKMfFwf2ibgDRzYadqpS/view?usp=share_link
Session on Transactions & Stored Procedures¶
Description
Session Notebook PDF : https://drive.google.com/file/d/1EbU6gv0xLmllHRvnpHkp4HEmqHGDt7qx/view?usp=share_link
Week 18 - Descriptive Statistics Contd.¶
Session 39 - Descriptive Statistics Part 2¶
Description
Session 40 - Probability Distribution Functions - PDF, PMF & CDF¶
Description
SQL Datetime Case Study on Flights dataset¶
Description
Code - https://docs.google.com/document/d/1g67XZ96yhIz6mqfzXJRhVvZXP26VVbYE4YX51Ck4c84/edit?usp=sharing
Dateset - https://docs.google.com/spreadsheets/d/13_PAiduepzVBMU-WYp_10NBMfMH12A9D2KgzieOtk1o/edit?usp=sharing
EDA question Pdf : https://drive.google.com/file/d/1DPA__10bpvte9wtvgLqehpZ9w31HZm2g/view?usp=share_link
For Q6: During Updation you would be getting error, like same as sir got warnings : Truncate invalid double value '5m', This is coming because of row no 5975.
Updated Query :
UPDATE flights
SET duration_mins =
CASE
WHEN duration LIKE '%h %m' THEN
SUBSTRING_INDEX(duration, 'h', 1) * 60 +
SUBSTRING_INDEX(SUBSTRING_INDEX(duration, ' ', -1), 'm', 1)
WHEN duration LIKE '%h' THEN
SUBSTRING_INDEX(duration, 'h', 1) * 60
WHEN duration LIKE '%m' THEN
SUBSTRING_INDEX(duration, 'm', 1)
END ;
Session on Database Design | SQL Data Types | Database Normalization¶
Description
Session On Database Design Notebook Pdf : https://drive.google.com/file/d/1sQj7UJrSX_Y74qWgmez84Pv3oH6z6TCg/view?usp=share_link
Week 17 - Descriptive Statistics¶
Session 38 - Descriptive Statistics Part 1¶
Description
Session on Datetime in SQL¶
Description
Code - https://docs.google.com/document/d/1Izh0o3ZTsVcSw5ZHsX5uB7v7IGxJ7hbX7a3VfIuFv1c/edit?usp=sharing
PDF - https://drive.google.com/file/d/11s40kbk56ZaA56f7c34SEGmZmy-_GNOa/view?usp=share_link
Documentation - https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format
Week 16 - Advanced SQL¶
Task 36 Solutions¶
Description
Career Pe Charcha - Markdown Basics + How to improve Github Profile¶
Description
Code - https://colab.research.google.com/drive/1-v-RUuQlaVUKiVvvEjpsiK2sw-BovjlB?usp=sharing
Links - https://github-readme-streak-stats.herokuapp.com/?user=campusx-official
https://github-readme-stats.vercel.app/api/top-langs/?username=campusx-official
https://github-readme-stats.vercel.app/api?username=campusx-official
Session 37 - Window Functions Part 2¶
Description
Code - https://docs.google.com/document/d/1PyAU4tBcBxUR5Vn4GEZngPJXkKArAHqUBZgCCu1fFk4/edit?usp=sharing
Datasets: https://drive.google.com/drive/folders/1N3WzcWpwiYwxobFIlNn9sOHVRZ-tpBdc?usp=share_link
Window Functions Part-1 Pdf : https://drive.google.com/file/d/12P7vW2VBq0_4Nm3j1aQDB599HOy0OGtk/view?usp=share_link
Window Functions Part-2&3 Pdf : https://drive.google.com/file/d/1pTPslw_dOMwkK06Cu-lcHX5NzRtmNPW5/view?usp=share_link
Session 37 - Window Functions Part 3¶
Description
Window Functions Part-1 Pdf : https://drive.google.com/file/d/12P7vW2VBq0_4Nm3j1aQDB599HOy0OGtk/view?usp=share_link
Window Functions Part-2&3 Pdf : https://drive.google.com/file/d/1pTPslw_dOMwkK06Cu-lcHX5NzRtmNPW5/view?usp=share_link
Timestamp 19:30 : percentile_disc and percentile_cont
These functtions are not there in MySQL(InoDB) (Workbench default server).
In the sessions I have connected Xampp MySQL server with workbench.
Session on Data Cleaning using SQL | Laptops Dataset¶
Description
Code - https://docs.google.com/document/d/1_urkFSBPwEzHnZuycGlcjz_S5ofGLXynxKC0cPHP-uM/edit?usp=sharing
PDF - https://drive.google.com/file/d/1bsIjjciJMHLjagopBX-YJN8eoJaVITsr/view?usp=share_link
Laptop dataset Uncleaned: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset
"Error Code: 1093 Resolution": https://docs.google.com/document/d/1-z5GmHsSpRWBa2_hvswMxDUO4f-ozsPTG-4mtyExhk8/edit?usp=sharing
1:22:00 : There are duplicate data in the datasets.
Session on EDA using SQL | Laptops Dataset¶
Description
Code Data Cleaning - https://docs.google.com/document/d/1_urkFSBPwEzHnZuycGlcjz_S5ofGLXynxKC0cPHP-uM/edit?usp=sharing
Code EDA - https://docs.google.com/document/d/1Izh0o3ZTsVcSw5ZHsX5uB7v7IGxJ7hbX7a3VfIuFv1c/edit?usp=sharing
Laptop dataset Uncleaned: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset
"Error Code: 1093 Resolution": https://docs.google.com/document/d/1-z5GmHsSpRWBa2_hvswMxDUO4f-ozsPTG-4mtyExhk8/edit?usp=sharing
EDA Plan
1. head -> tail -> sample
2. for numerical cols
- 8 number summary[count,min,max,mean,std,q1,q2,q3]
- missing values
- outliers
-> horizontal/vertical histograms
3. for categorical cols
- value counts -> pie chart
- missing value
4. numerical - numerical
- side by side 8 number analysis--
- scatterplot
- correlation
5. categorical-categorical
- contigency table -> stacked bar chart
6. numerical-categorical
-> compare distribution across categories
8. missing value treatment
9. feature engineering
- ppi
- price_bracket
10. one hot encoding
Week 15 - SQL Continued¶
Session 34 - SQL Joins¶
Description
Task 34 Solutions¶
Description
SQL Case Study 1 | Zomato Dataset¶
Description
Session 35 - Subqueries in SQL¶
Description
Dataset Link - https://drive.google.com/drive/folders/1xCNbO_LJIkr7bi9YDa7hUFYgJ-IZ01A-?usp=share_link
For reading movies.csv in Python :
df = pd.read_csv('movies.csv', delimiter=';', encoding_errors='ignore')
Task 35 Solutions¶
Description
Making a Flights Dashboard using Python and SQL¶
Description
SQL Interview Questions Part 1¶
Description
Week 14 - SQL Continued¶
Session 32 - SQL DML Commands¶
Description
Task 32 Solutions¶
Description
Session 33 - SQL Grouping + Sorting¶
Description
Task 33 Solutions¶
Description
Career QnA¶
Description
Session 2 on Tableau - Sales Dataset¶
Description
Week 13 - SQL Basics¶
Session 30 - Database Fundamentals¶
Description
Session 31 - SQL DDL Commands¶
Description
Session 1 on Tableau - Olympics Dataset¶
Description
Week 12 - Data Analysis Process Contd.¶
Session on Data Cleaning Case Study - Smartphone dataset¶
Description
-
colab.research.google.com{ target="blank" title="https://colab.research.google.com/drive/1TGYxt3X2YN7SlfocQg-6A9pakp-WXZX?usp=sharing" }
Session 29 - Exploratory Data Analysis (Titanic Dataset)¶
Description
Notebook Link: https://colab.research.google.com/drive/13rFqQJqU5RgxSdtUARZAUrzAoweE3rbQ?usp=sharing
---------------------------------------------------------------------------------------------------------------------------------------- Dataset Link : https://drive.google.com/drive/folders/1oFZxHRuAw_JI7soe46mmO61s-WM7jtQg?usp=share_link
Session on Data Cleaning Part 2¶
Description
- colab.research.google.com{ target="blank" title="https://colab.research.google.com/drive/1TGYxt3X2YN7SlfocQg-6A9pakp-WXZX?usp=sharing" }
Session on EDA Case Study - Smartphones Dataset¶
Description
EDA code - https://colab.research.google.com/drive/1CLkCDQAFZfNmLO0MRf14bTKbnV-1NtMs?usp=sharing
Data Cleaning round 1 code - https://colab.research.google.com/drive/1TGYxt3X2YN7SlfocQg_-6A9pakp-WXZX?usp=sharing
Data Cleaning round 2 code - https://colab.research.google.com/drive/1E7nUdvyKpm6C-4oIw67rV6EufLEeCTrx?usp=sharing
Dataset (v3 & v5): https://drive.google.com/drive/folders/1xujCj-9CAwtenyPdtU07bMPXomarOSoV?usp=share_link
-
colab.research.google.com{ target="blank" title="https://colab.research.google.com/drive/1TGYxt3X2YN7SlfocQg-6A9pakp-WXZX?usp=sharing" }
Week 11 - Data Analysis Process¶
Session 27 - Data Gathering | Data Analysis Process¶
Description
Codes:
https://github.com/campusx-official/100-days-of-machine-learning/tree/main/day17-api-to-dataframe
https://github.com/campusx-official/pandas-io
Update:
@55:45 : Below screenshot is or similar example. There is a mistake in the video, instead of chunks
in the for loop, it is chunk.
Task 27 Solutions¶
Description
Session 28 - Data Assessing and Cleaning¶
Description
Code - https://colab.research.google.com/drive/1ca-jlBvJ4uqpbCHFFgCFp9akIY7FSmGc?usp=sharing
Dataset - https://github.com/campusx-official/data-wrangling
For error at 2:03:00 : 'float' type is not subscriptable while extracting Phone number and email, use below code.
# For Phone Number
patients_df["contact"].apply(lambda x: find_contact_details(x)).apply(lambda x:'No data' if type(x[0])==float else x[0][-1])
# For Email:
patients_df["contact"].apply(lambda x: find_contact_details(x)).apply(lambda x:x[1])
Session on ETL using AWS RDS¶
Description
Session on Advanced Web Scraping using Selenium¶
Description
Code - https://github.com/campusx-official/advanced-web-scraping
Chrome Driver - https://chromedriver.chromium.org/downloads
Selenium docs - https://selenium-python.readthedocs.io/
Week 10 - Data Visualization Continued¶
Session 25 - Plotting using Seaborn¶
Description
Task 25 Solutions¶
Description
Session 26 - Plotting using Seaborn Part 2¶
Description
Code - https://colab.research.google.com/drive/18GuhOaBBhaBJ9RtVNHRJQzNxNPPSKBrD?usp=sharing
Seaborn Theming and Color Palette: https://colab.research.google.com/drive/1FjKejgCJwUsxm_XYiRf25-jCvW4rDK64?usp=sharing
Task 26 Solutions¶
Description
Session on Open Source Software Part 1¶
Description
Github discussions - https://resources.github.com/devops/process/planning/discussions/
Github Projects - https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects
Github Actions - https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions
Session on Open Source Software Part 2¶
Description
Week 9 - Data Visualization¶
Session 23 - Plotting using Matplotlib¶
Description
Datasets used in the session - https://drive.google.com/drive/folders/1_TyTVEMxhEoIs1nsU4V_p1rH5x4jLSiW?usp=share_link Notebook Links -https://colab.research.google.com/drive/1ksmroQtN_KoCeJzpzPgGAbLv0UG_6G_a?usp=sharing
Task 23 Solutions¶
Description
Session 24 - Advanced Matplotlib¶
Description
Datasets used in the session - https://drive.google.com/drive/folders/17q7WRLJ7hdkA7nk8J_GUTEcZ7WHgPKUg?usp=share_link
Notebook Links - https://colab.research.google.com/drive/14TP6tNzUT5M0YfgzwTMF_6WBuQkLUgXp?usp=sharing
Task 24 Solutions¶
Description
Session on Plotly(Express)¶
Description
Making a Corona virus(Covid-19) Dashboard using Plotly and Dash¶
Description
Project using Plotly¶
Description
Datasets - https://www.kaggle.com/datasets/sirpunch/indian-census-data-with-geospatial-indexing
https://www.kaggle.com/datasets/danofer/india-census?select=india-districts-census-2011.csv
Kaggle Notebook - https://www.kaggle.com/code/campusx/notebook1f43313be3
Project Files - https://github.com/campusx-official/india-data-viz-mini-project
For ModuleNotFoundError in Session Indian Startup Funding at timestamp : 2:13:00 :
-> Similar Issue been resolved in this session from Timestamp 1:54:35
Week 8 - Advanced Pandas Continued¶
Session 21 - MultiIndex Series and DataFrames¶
Description
Datasets used in the session - https://drive.google.com/drive/folders/1AP_M96SnIe985aQQp9SmDkz69AXHrs5t?usp=share_link Notebook Link - https://colab.research.google.com/drive/17l8EddlrS2Ed35frmeS6cHAvdf5Fbw-g?usp=sharing
Task 21 Solutions¶
Description
Session 22 - Vectorized String Operations | DateTime in Pandas | Pivot Table¶
Description
Datasets used in the session - https://drive.google.com/drive/folders/1Vy1LilxgmyBiDg-UAnrBnJ5R1XBvwHGx?usp=share_link
Notebook Link -
Multi Index Object : https://colab.research.google.com/drive/17l8EddlrS2Ed35frmeS6cHAvdf5Fbw-g?usp=sharing
Strings : https://colab.research.google.com/drive/1IbvN3BABXN2sgxNr3EckyB_WO4tP0EpE?usp=sharing
Date Time : https://colab.research.google.com/drive/1zkfBGu48iLfJWNzAosD_qbCjzi7dYf8-?usp=sharing
Task 22 Solutions¶
Description
Pandas Case Study - Time Series Analysis¶
Description
Pandas Case Study 2 - Working with textual data¶
Description
Week 7 - Advanced Pandas¶
Session 19 - GroupBy Object in Pandas¶
Description
Datasets used in the session - https://drive.google.com/drive/folders/1IiMIOGCv-giUV_rtF02sImgkaVuAQxkz?usp=share_link
Notebook Link - https://colab.research.google.com/drive/1JZwTCZp2kbiTACzcuXmWq_FL-GRNo9ym?usp=sharing
Task 19 Solutions¶
Description
Code - https://colab.research.google.com/drive/157g2UZM9StxNMz-dUw2uPH-JeKn6z269?usp=sharing
IPL Delevries Dataset (for Q5 - to Q8, ) : https://docs.google.com/spreadsheets/d/1ROM5oTEEMXfnBHAmz3XC5lwGMgnQaw80PiXadKHRgGU/edit?usp=sharing
Session 20 - Merging, Joining & Concatenating¶
Description
Datasets used in the session - https://drive.google.com/drive/folders/1tE0LxbzsVX70y8Br_VxiZas288ODDBup?usp=share_link
Notebook Link - https://colab.research.google.com/drive/1Xs7On5fr6ZZnrwGxgMXWld2XUrniqXGj?usp=sharing
Task 20 Solutions¶
Description
Session on Streamlit¶
Description
Code - https://github.com/campusx-official/streamlit-basics
Learn LaTeX - https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes#What_is_LaTeX.3F
Learn Markdown - https://www.markdownguide.org/basic-syntax/#images-1
Streamlit docs - https://docs.streamlit.io/library/api-reference
Dataset link - https://www.kaggle.com/datasets/sudalairajkumar/indian-startup-funding
Plan of Action - https://docs.google.com/document/d/1zk4751zmG2b4XnYGW06tu0MWyr2PgLlMaSci7eUVL2M/edit?usp=sharing
Pandas Case Study - Indian Startup Funding¶
Description
Code - https://github.com/campusx-official/streamlit-basics
Kaggle code - https://www.kaggle.com/campusx/startup-data-analysis
Plan of Action - https://docs.google.com/document/d/1zk4751zmG2b4XnYGW06tu0MWyr2PgLlMaSci7eUVL2M/edit?usp=sharing
Dataset - https://www.kaggle.com/datasets/sudalairajkumar/indian-startup-funding
Timestamp -> 0:40:50
# Converting date column datatype
df['date'] = df['date'].replace({'05/072018':'05/07/2018', '01/07/015':'01/07/2015', '22/01//2015':'22/01/2015'})
df['date'] = pd.to_datetime(df['date'])
For Issue at time 2:03:58 :
Use session state for option:
# Like below, the rest codes are the same as of sir's GitHub repo.
st.session_state.option = st.sidebar.selectbox(
'Select One', ['Overall Analysis', 'StartUp', 'Investor'], key='analysis')
option = st.session_state.option
if option == 'Overall Analysis':
load_overall_analysis()
For ModuleNotFoundError at 2:13:00 :
-> Similar Issue been resolved in Week 9: Project Using Plotly session from Timestamp 1:54:35
Session on Git¶
Description
Download Git - https://git-scm.com/download/win
What is GIT
What is VCS/SCM
Examples of VCS
Why git/VCS is needed
Types of VCS
- Centralized
- Distributed
Advantages
- Version control
- Bug Fixing
- doing non-linear development
- collaborative development
************************************
How git works? -> terminology
installing git
************************************
Creating a repo
cloning someone else's repo
status
************************************
Making Changes
- add
- commit
- When to commit?
- commit messages?
** short
** Explain what
** rule of thumb no and
** This commit will ...
- add .
- gitignore
***********************************
Seeing commits
- log -> oneline -> stat -> p
- show
- seeing commits of someone else's repo
- diff
**********************************
Creating versions of a software
- tag->X.Y.Z
X – The major version, used for making major and backward-incompatible changes.
Y – The minor version, used for adding functionality while maintaining backwards compatibility.
Z – The patch version, used for making small bug fixes while maintaining backwards compatibility.
- deleting tag
- adding tag to a past commit
**********************************
GIT PDF : https://drive.google.com/file/d/1jmialN0Jhhuj5fl2K1N7R9LosoIjrtDb/view?usp=sharing
Session on Git and Github Part 2¶
Description
******************************************************
Non linear development(Branching)
******************************************************
-> Scenario(Individual)
-> Scenario(Team)
-> Using branches -> You had one branch already
-> concept of head pointer
HEAD is the reference to the most recent commit in the current branch. This means HEAD is just like a pointer that keeps track of the latest commit in your current branch.
-> Creating branches on head
-> Creating branches on past commits
-> Show all branches -> Active Branch
-> switch between branches->How this works?
-> Understanding what will come under a branch(git log)
-> Making new commits in all branches(git log)
-> see all branches at once -> --graph --all
-> deleting branches
******************************************************
Merging Branches
******************************************************
-> What is merging
-> What happens at merging
** A new commit is created on merging
** look at the branches that it's going to merge
** look back along the branch's history to find a single commit that both branches have in their commit history
** combine the lines of code that were changed on the separate branches together
** makes a commit to record the merge
** Note - Merging happens at the checked out branch. No new branches are created
-> Types of merging -> Fast Forward -> Regular(Divergent branches)
-> Fast Forward -> show log
-> Merging Divergent Branches -> show log
-> Merge Conflict
(<<<<<<< HEAD) everything below this line (until the next indicator) is code of current branch
(=======) is the end of the original lines, everything that follows (until the next indicator) is what's on the branch that's being merged in
(>>>>>>> heading-update) is the ending indicator of what's on the branch that's being merged in
-> Resolving Conflicts
****************************************************************************************************
Undoing Changes
****************************************************************************************************
-> editing the last commit message
-> forgot to add some files to the last commit
-> rolling back to a specific state using show
-> revert a commit
*****************************************************************************************************
Working with a remote repo
******************************************************************************************************
-> Need -> scenario-> collaboration
-> The flow diagram
-> create a new repo on github
-> add remote(git remote add origin
-> push code(git push
-> git log -> tracking branch
-> add a readme file
-> pull code
GIT PDF : https://drive.google.com/file/d/1jmialN0Jhhuj5fl2K1N7R9LosoIjrtDb/view?usp=sharing
Week 6 - Pandas¶
Session 16 - Pandas Series¶
Description
Important Series Methods | Supplementary Session¶
Description
Session 17 - Pandas DataFrame¶
Description
Session 18 - Important DataFrame Methods¶
Description
Session on API Development Using Flask¶
Description
Week 6 - Numpy Interview Questions¶
Description
Task 16 Solutions¶
Description
Task 17 Solutions¶
Description
Task 18 Solutions¶
Description
Code - https://colab.research.google.com/drive/1NcOunueBaiVEkj2Y4DcGyf2jBIHlE9QQ?usp=sharing
Question No. 6 Solution:
Modification in notebook: While calculation home_win and away_win, use bitwise AND operator-(&). In solution bitwise OR is given.
home_win = df[(df.WinningTeam == team) & (df.Team1 == team)].shape[0] / df[df.Team1 == team].shape[0] * 100
away_win = df[(df.WinningTeam == team) & (df.Team2 == team)].shape[0] / df[df.Team2 == team].shape[0] * 100
Week 5 - Numpy¶
Session 13 - Numpy Fundamentals¶
Description
Session 14 - Advanced Numpy¶
Description
Session 15 - Numpy Tricks¶
Description
Session on Web Development using Flask¶
Description
Code - https://github.com/campusx-official/nlp-web-app
HTML Playlist - https://www.youtube.com/watch?v=jp3gE2Ow6Fw&list=PLKnIA16_RmvaPjreiKXncoLCLQKE0I_9D&ab_channel=CampusX
CSS Playlist - https://www.youtube.com/watch?v=4d79CMy5-LI&list=PLKnIA16_RmvYz9J-59mtVWLQuPbsWd56P&ab_channel=CampusX
Fundamentals of Web Development - https://www.youtube.com/watch?v=XEq5gEhqPNE&list=PLKnIA16_RmvaAtO498fZOVVomyx01yZhx&ab_channel=CampusX
Task 13 Solutions¶
Description
Task 14 Solutions¶
Description
Task15 Solutions¶
Description
Week 4 - Advanced Python¶
Session 10 - File Handling + Serialization & Deserialization¶
Description
Code https://colab.research.google.com/drive/1TP7ks1pnEzJwwzHtswkSYvMWwo2HeRxM?usp=sharing
at Timestamp : 54:00 Reading a big text file
with open('big.txt', 'r') as f:
chunk_size = 10
data = f.read(chunk_size)
while len(data) > 0:
print(data, end='****')
data = f.read(chunk_size)
Session 11 - Exception Handling¶
Description
Session 12 - Decorators & Namespaces¶
Description
Supplementary Session on Iterators¶
Description
Supplementary Session on Generators¶
Description
Session on Resume Building¶
Description
Session on GUI Development using Python [2nd Dec - Fri]¶
Description
Week 4 - Interview Questions¶
Description
Task 10 Solutions¶
Description
Task 11 Solutions¶
Description
Task 12 Solutions¶
Description
Week 3 - Object Oriented Programming(OOP)¶
Session 7 - OOP Part 1 | Class & Object¶
Description
Task 7 Solutions¶
Description
Session 8 - OOP Part 2 | Encapsulation & Static Keyword¶
Description
task-8-solutions¶
Description
Session 9 - OOP Part 3 | Inheritance & Polymorphism¶
Description
What is Abstraction | OOP Concept¶
Description
task-9-solutions¶
Description
Session on OOP Project¶
Description
Week 3 - Interview Questions¶
Description
Interview Questions class discussion -
https://colab.research.google.com/drive/1pFSCaenXUtrWRPgP4zTOcM_2GQIMjU4z?usp=sharing
More Interview Questions -
https://colab.research.google.com/drive/1LlTdY0LeYdI893EtSN3CqZOFg_bFxrPu?usp=sharing
Week 2 - Python Data Types¶
Session 4 - Lists in Python¶
Description
Task 4 Solutions¶
Description
Code - https://colab.research.google.com/drive/1uBqC9zOZH3e26WWc-R4dplohg2uvR-Xs?usp=sharing
Problem 14 :
print([[row[i] for row in matrix]for i in range(len(matrix))]) # Only works for Square Matrix.
# Updated code
print([[row[i] for row in matrix]for i in range(len(matrix[0]))])
Session 5 - Tuples + Sets + Dictionary¶
Description
Task 5 Solutions¶
Description
Session 6 - Functions in Python¶
Description
Task 6 Solutions¶
Description
Session on Array Interview Questions¶
Description
Code - https://colab.research.google.com/drive/1xUoy5AW_vlI92xbIcfEbnx0ZnGb7IKbj?usp=sharing
Time Stamp: 40:00. Q10 Maximum Sum SubArray.
Getting the best sum but Array printed is not correct.
This is happening because of list referencing. Say we have a list a = [1,2,3] and another list b which is same as a like a = b, so if we make changes in a, b will also change. But if we assign b like: b = a[:] This time upon changing a, b will not change.
Correction in approach 1 :
d[sum(subarray)] = subarray[:] # Cloning will solve this.
Correction Correction in Approach 2:
best_seq = curr_seq[:]
Week 2 - Interview Questions¶
Description
Week 1 - Basics of Python Programming¶
Session 1 - Python Basics¶
Description
Code used in the session - https://colab.research.google.com/drive/10jVbuKq2Owsz_hIIrXA9Y09DMLdDt_21?usp=sharing
Session 2 - Operators + If-Else + Loops¶
Description
Code for the session - https://colab.research.google.com/drive/1dJIncqudN2wFNZ1P3_1sdJX76pzw-s-4?usp=sharing
Session Code (Updated) : https://colab.research.google.com/drive/1He-CC_4GUaswgQ2NFg8exc23A-mChTwK?usp=sharing
Week 1 - Task 1 + Task 2 Solutions¶
Description
Code for Task 1 Solution - https://colab.research.google.com/drive/15ouziM6EkwvOYIJM_Z4kX9AUJ6TmmZnh?usp=sharing
Code for Task 2 Solution - (Updated Link) - https://colab.research.google.com/drive/1mkBnQb0IELTQCpQ9nhtEYhqOgG84DTVd?usp=share_link
Session 3 - Python Strings¶
Description
Programming Problems on Strings¶
Description
Week 1 Task 3 Solutions¶
Description
How to Build a Portfolio Website for Data Science¶
Description
Portfolio Website Example - https://www.kunalgohrani.info/
HTML playlist - https://www.youtube.com/watch?v=jp3gE2Ow6Fw&list=PLKnIA16_RmvaPjreiKXncoLCLQKE0I_9D&index=1&t=0s&ab_channel=CampusX
CSS Playlist - https://www.youtube.com/watch?v=4d79CMy5-LI&list=PLKnIA16_RmvYz9J-59mtVWLQuPbsWd56P&index=1&t=0s&ab_channel=CampusX