100 Days of Data Science Code

Starting a 100 Days Code Challenge for Learning Data Science from Scratch is my goal on Learning Data Science in Machine Learning by:

Learning Fundamentals of Python
Python Libraries for Data Science
Data Manipulation and Preprocessing
Machine Learning Basics
Advanced Machine Learning Techniques
Deep Learning and Neural Networks
Model Evaluation and Deployment
Data Science Project and Wrap-Up

Articles Published on LinkedIn

Project 1: Bank Management System: Python, OOPS, and MySQL Database
25 days Completion: Successful Completion of 25 Days in 100 Days of Data Science Code
50 days Completion : Successful Completion of 50 Days in 100 Days of Data Science Code

Calendar Progress

July 2023

Sun	Mon	Tues	Wed	Thurs	Fri	Sat
-	-	-	-	-	-	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18 ✅	19 ✅	20 ✅	21 ✅	22 ✅
23 ✅	24 ✅	25 ✅	26 ✅	27 ✅	28 ✅	29 ✅
30 ✅	31 ✅	-	-	-	-	-

August 2023

Sun	Mon	Tues	Wed	Thurs	Fri	Sat
-	-	1 ✅	2 ✅	3 ✅	4 ✅	5 ✅
6 ✅	7 ✅	8 ✅	9 ✅	10 ✅	11 ✅	12 ✅
13 ✅	14 ✅	15 ✅	16 ✅	17 ✅	18 ✅	19 ✅
20 ✅	21 ✅	22 ✅	23 ✅	24 ✅	25 ✅	26 ✅
27 ✅	28 ✅	29 ✅	30 ✅	31 ✅	-	-

September 2023

Sun	Mon	Tues	Wed	Thurs	Fri	Sat
-	-	-	-	-	1 ✅	2 ✅
3 ✅	4 ✅	5 ✅	6 ✅	7 ✅	8 ✅	9 ✅
10 ✅	11 ✅	12 ✅	13 ✅	14 ✅	15 ✅	16 ✅
17 ✅	18 ✅	19 ✅	20 ✅	21 ✅	22 ✅	23 ✅
24 ✅	25 ✅	26 ✅	27 ✅	28 ✅	29 ✅	30 ✅

October 2023

100 Days of Data Science Code Day-to-Day Progress

DAY 1 (18 July 2023):

Goal: Python Basics

Control flow statements like if-else conditions and loops.

Github Repository: Source Code

LinkedIn post: Daily Update

DAY 2 (19 July 2023):

Goal: Functions and Modules

Concept of modules.
How to import and use built-in modules as well as create your own.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 3 (20 July 2023):

Goal: Data Structures

Python's built-in data structures such as lists, tuples, dictionaries, and sets.
Also, learn about indexing, slicing, and manipulating these data structures.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 4 (21 July 2023):

Goal: File Handling and Exception Handling

Read from and write to files in Python.
Learn about exception handling and how to handle errors using try-except blocks.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 5 (22 July 2023):

Goal: Python Classes and Objects

Class Declaration
Object Instantiation
Constructor and Destructor
Built-in Class Attributes and Functions
Instance, Class and Static Variables and Functions.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 6 (23 July 2023):

Goal: Python OOPs Concepts and Implementation in Python

Data Abstraction
Encapsulation
Inheritance
Polymorphism.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 7 (24 July 2023):

Goal: Advanced Python Concepts

Higher Order Functions
List Comprehensions
Regular Expressions (RegEx)

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 8 (25 July 2023):

Goal: Python Connectivity with MySQL Database

Setting Up MySQL Connection
Executing SQL Queries.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 9 (26 July 2023):

Goal: Day 1 of Bank Management System

Database Setup
Python Environment Setup
Database Connectivity
Create Basic Classes
Customer Management.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 10 (27 July 2023):

Goal: Day 2 of Bank Management System

Account Management(Create Account, List Account Details)
Basic Error Handling(Apply Validations on Input values)
Testing and Debugging(Checking Input value validations).

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 11 (28 July 2023):

Goal: Final Day of Project (Transfer Operations and Final Testing)

Transfer Operation
Final Testing and Documentation
Clean Up and Deployment.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 12 (29 July 2023):

Goal: NumPy Basics and Array Manipulation

Introduction to NumPy
Installing NumPy
Creating NumPy arrays
Array indexing and slicing
Array reshaping and resizing
Stacking and splitting arrays.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 13 (30 July 2023):

Goal: Mathematical Operations with NumPy

Element-wise Operations
Aggregation Functions
Linear Algebra with NumPy.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 14 (31 July 2023):

Goal: Statistics Functions with NumPy

Descriptive statistics
Random number generation
Sorting and searching arrays

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 15 (1 Aug. 2023):

Goal: Introduction to Pandas and Data Structures in Pandas

Introduction to Pandas
Install Pandas
Types of Data Structures : Series, DataFrames
Importing and Exporting DataFrames
DataFrame Functions
Accessing DataFrames : Indexing, Slicing, loc[], iloc[].

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 16 (2 Aug. 2023):

Goal: Data Manipulation and Data Aggregation using Pandas

Advanced Indexing and Selection - (Label-based indexing, boolean indexing, and advanced slicing)
Combining DataFrames - (Concatenation, merging, and joining techniques)
Data Manipulation
Advanced Data Manipulation - (reshaping data, pivoting, and melting)
Data Aggregation and Grouping - (groupby() and other aggregation Functions)

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 17 (3 Aug. 2023):

Goal: Data Cleaning

Basic Data Cleaning and Pre-Processing:
- Removing Duplicates
- Fixing Wrong Data
- Cleaning Data of Wrong Format
- Cleaning Empty Cells
- dropna(), fillna()
- drop_duplicates()
Data Transformation - ( apply() and map() )
Working with Text Data - Functions of str attribute

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 18 (4 Aug. 2023):

Goal: Feature Engineering and Time Series Analysis

Feature Engineering:
- Data Normalization
- Data Scaling
- Data Standardization
Time Series Analysis and Resampling:
- Working with datetime data
- Date offsets
- Resampling time series data
- Datetime index

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 19 (5 Aug. 2023):

Goal: Matplotlib Introduction and Line Plots

Matplotlib:
- Installation of Matplotlib library
- Import Matplotlib library
Matplotlib Pyplot:
- Plotting x and y points
- Plotting without line
- Matplotlib Markers (Types, Color, Size)
- Matplotlib Line (LineStyle, Line colors, line width)
- Single Plot with multiple lines
- Matplotlib Labels and Title (Create Label, Create Title, Set font properties to Title and Label, Title Position)
- Adding Grid Lines (Line Properties of grid)
Matplotlib Bars:
- Vertical Bars
- Horizontal Bars
- Bar colors
- Bar width
- Bar height

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 20 (6 Aug. 2023):

Goal: Matplotlib Scatter Plot and Histogram

Subplots:
- subplot() function
- Title for each subplot
- Super title of Plot
Matplotlib Scatter Plot:
- Create Scatter Plots
- Compare Plots
- Color each dots
- ColorMap for dots
- Combine Color, Size and Alpha values
Matplotlib Histograms:
- Create Histogram
Matplotlib Pie Charts:
- Create Pie Chart
- Labels
- startAngle
- Explode
- Shadow
- Colors
- Legend
- Header

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 21 (7 Aug. 2023):

Goal: Seaborn Introduction

Seaborn:
- Installation of Seaborn
- Import Seaborn library
Different types of plots:
- Relational Plots
- Categorical Plots
- Distribution Plots
- Regression Plots
Categorical Plots:
- Bar Plot
- Count Plot
- Box Plot
- Violinplot
- Stripplot
- Swarmplot
- Factorplot
Distribution Plots:
- Histogram
- Distplot
- Jointplot
- Pairplot
- Rugplot
- KDE Plot

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 22 (8 Aug. 2023):

Goal: Seaborn Visualization Plots - Relational and Regression Plots

Customizing Seaborn Plots:
- Changing Figure Asthetics
- Removal of Spines
- Changing the Figure size
- Scaling the plots
- Setting the Style Temporarily
- Color Palette - (Diverging, Sequential, Default color palette)
Multiple Plots with Seaborn:
- Using Matplotlib - (add_axes(), subplot(), subplot2grid() functions)
- Using Seaborn - (FacetGrid() method, PairGrid() method)
Relational Plot Types:
- relplot()
- Scatter Plot
- Line Plot
Regression Plot Types:
- lmplot
- RegPlot
Matrix Plots:
- HeatMap
- Clustermap

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 23 (9 Aug. 2023):

Goal: Python Fundamentals Notes

Introduction
- Features
- Applications
Identifiers:
- Keywords
- Variables and Constants
Operators in python
Data types in python
- String data type and operations
- List data type and operations
- Tuple data type and operations
- Set data type and operations
- Dictionary data type and operations
Control Statements in python:
- Decision making
- looping statements
- looping control statements

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 24 (10 Aug. 2023):

Goal: Python Fundamentals Notes

Introduction
- Installation
- Import
Create arrays in python
Array creation using NumPy Functions
- zeros
- ones
- arange
- linspace
- eye
- identity
- fromiter
Accessing array elements
- Indexing and Slicing
Random number Generation
- rand()
- random()
- ranf()
- randint()
- randn()

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 25 (11 Aug 2023):

Goal: Pandas Revision

Introduction - Install, Import
Data Structures:
- Series
- DataFrames
DataFrames
- Importing and Exporting
- Functions - columns, describe(), info(), head(), tail(), isna()
- Accessing DataFrames - loc[], iloc[],
Basic Data Cleaning:
- Empty Cells
- Wrong Format Data
- Fixing Wrong Data
- Removing Duplicates
Apply filters
- apply()
- map() - Using Dictionary, Series, Function for mapping

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 26 (12 Aug 2023):

Goal: Introduction to Artificial Intelligence and Machine Learning Fundamentals

Artificial Intelligence:
Machine Learning:
- Difference between Artificial Intelligence and Machine Learning
- Applications of Machine Learning
- Limitations of Machine Learning
- Types of Machine Learning
  - Supervised Learning
  - Unsepervised Learning
  - Reinforcement Learning
- Comparisons between all types

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 27 (13 Aug 2023):

Goal: Understanding Machine Learning Workflow

1. Data Preprocessing:
- Data Cleaning
- Feature Selection/Extraction
- Normalization/Scaling
- Encoding Categorical Variables
- Splitting Data
2. Model Training:
- Selecting a Model
- Initializing Parameters
- Training Loop
- Gradient Descent (for Optimization)
- Hyperparameter Tuning
3. Model Evaluation:
- Metrics
- Cross-Validation
- Confusion Matrix
- ROC and AUC
- Overfitting and Underfitting

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 28 (14 Aug 2023):

Goal: Model Evaluation Techniques in Machine Learning

Cross-Validation
Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1-Score
- Area Under Curve (AUC) and Receiver Operating Characteristic (ROC)
Confusion Matrix
Overfitting and Underfitting Detection:
- Overfitting
- Underfitting

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 29 (15 Aug 2023):

Goal: Diagnosing and Addressing Underfitting and Overfitting

Underfitting:
- Choosing a more complex model
- Adding more features
- Fine-tuning hyperparameters
Overfitting:
- Collect more data
- Feature selection
- Cross-validation
- Regularization techniques
- Early stopping

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 30 (16 Aug 2023):

Goal: Simple Linear Regression Implementation

Linear Regression Introduction
Simple Linear Regression:
- Assumptions of Simple LR
- Equation of Simple LR
- Applications of Linear Regression
- Working of Linear Regression
- Finding goodness of fit
- Examples of Linear Regression
- Implementation of Simple Linear Regression
- Real-world Application: Salary Prediction

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 31 (17 Aug 2023):

Goal: Multiple Linear Regression and Implementation using Student Performance Analysis

Multiple Linear Regression (MLR):
- Key points of MLR
- Equation of MLR
- Assumptions of MLR
- Implementation of MLR using Python
- Real-world Application: Student Performance Analysis

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 32 (18 Aug 2023):

Goal: Classification in Machine Learning

Classification
Types of Learners:
- Lazy Learners: Firstly, store dataset and wait until receive test dataset.
- Eager Learner: Develop classification model based on training dataset, before receiving testing dataset.
Types of Classification Algorithms:
- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Neural Networks
Terminologies in Classification:
- Features and Labels
- Training and Testing Data
- Confusion Matrix
- Precision, Recall, F1-Score
- ROC and AUC Curve
Types of Classification:
- Binary Classification: Two classes (e.g., Yes/No)
- Multiclass Classification: Multiple distinct classes (e.g., Cat/Dog/Horse)
Models' Evaluation Techniques for Classification: Used for finding goodness of model's fit:
- Accuracy
- Precision and Recall
- F1-Score
- ROC Curve and AUC
- Confusion Matrix

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 33 (19 Aug 2023):

Goal: Logistic Regression Implementation

Logistic Regression:
- Logistic Function (Sigmoid Function)
- Assumptions of Logistic Regression
- Types of Logistic Regression:
  - Binary / Binomial
  - Multinomial
  - Ordinal
- Terminologies involved in Logistic Regression
- Implementation of Logistic Regression
Difference between Linear Regression and Logistic Regression

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 34 (20 Aug 2023):

Goal: Decision Tree Concepts

Decision Tree:
- Components of a Decision Tree
  - Root Node
  - Internal Nodes
  - Leaf Nodes
- Attribute Selection Measures(ASM):
  - Entropy
  - Information Gain
  - Gini Index
- How Decision Trees Work
- Advantages of Decision Trees

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 35 (21 Aug 2023):

Goal: Decision Tree Implementation

Decision Tree Implementation Setup:
- Data Pre-processing
- Model Training
- Predicting the Results
- Model Evaluation Techniques
Examples for Decision Tree Implementation:
- IRIS Flower Classification
- Red Wine Quality Prediction

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 36 (22 Aug 2023):

Goal: Ensemble Methods

Ensemble Methods:
- Bagging
- Boosting
- Stacking
- Advantages of Ensemble Methods

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 37 (23 Aug 2023):

Goal: Gradient Boosting in Machine Learning

Gradient Boosting in Machine Learning:
- What is Gradient Boosting
- Key Components of Gradient Boosting
- How Gradient Boosting Works
- Benefits of Gradient Boosting

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 38 (24 Aug 2023):

Goal: AdaBoost and XGBoost

AdaBoost and XGBoost:
- AdaBoost (Adaptive Boosting)
- XGBoost (Extreme Gradient Boosting)
- Advantages of AdaBoost and XGBoost
- Applications

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 39 (25 Aug 2023):

Goal: Random Forests Introduction

Random Forests:
- What are Random Forests
- Key Components of Random Forests
- How Random Forests Work
- Benefits of Random Forests
- Real-world Applications of Random Forests

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 40 (26 Aug 2023):

Goal: Random Forest Implementation and Hyperparameter Tuning

Random Forest Implementation:
- Step-by-Step Approach
- IRIS Flower Prediction
- Red Wine Quality Prediction
Hyperparameter Tuning:
- Unlocking Model Potential
- GridSearchCV
- RandomizedSearchCV

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 41 (27 Aug 2023):

Goal: Decision Tree and Random Forest Example

Decision Tree in Action
Enchantment of Random Forests
Social Media Ads prediction

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 42 (28 Aug 2023):

Goal: Support Vector Machine (SVM) Introduction

Introduction to SVM
Terminologies used in SVM
Advantages of SVM
Limitations of SVM

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 43 (29 Aug 2023):

Goal: SVM Implementation

SVM Implementation:
- Linear SVM (Social Media Ads) : Kaggle Notebook
- Non-Linear SVM (IRIS Flower Prediction) : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 44 (30 Aug 2023):

Goal: SVM Regression Implementation

SVM Regression Implementation:
- Salary Prediction : Kaggle Notebook
- Boston Housing Price Prediction : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 45 (31 Aug 2023):

Goal: Introduction to KNN

KNN Introduction
Distance Metrics:
- Euclidean Distance
- Manhatten Distance
- Minkowski Distance
How KNN works
How to choose value of 'K'

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 46 (1 Sept 2023):

Goal: KNN Implementation

KNN Classification:
- IRIS Flower Prediction : Kaggle Notebook
- Mushroom Clasification : Kaggle Notebook

KNN Regression:
- Employee Salary Prediciton : Kaggle Notebook
- Student Performance Prediction : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 47 (2 Sept 2023):

Goal: KNN Hyperparameter Tuning

KNN Regression:
- House Price Prediction : Kaggle Notebook
KNN Classification:
- BMI Classification : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 48 (3 Sept 2023):

Goal: ML Fundamentals Revision

What is AI
What is ML
Machine Learning
Model Evaluation Techniques in ML
- Classification: Accuracy Score, Confusion Matrix, Classification Report
- Regression: Mean Absolute Errors,Mean Square Errors, Root Mean Square Errors
Exploratory Data Analysis (EDA)
Handling Outliers
- Removing Outliers
- Transforming Values

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 49 (4 Sept 2023):

Goal: 5G Resource Allocation Capstone Project - MLR, SVR and KNN Regression Models

Resource Allocation in 5G Network Service Project:
- Data Pre-Processing
- Implementation:
  - Polynomial Regression
  - SVM Regression
  - KNN Regression
- Model Evaluation:
  1. Mean Absolute Errors
  2. Mean Square Errors
  3. Root Mean Square Errors
- Kaggle Notebook : Link to Notebook
- Comparison of Model Performances (Multiple Bar Charts)

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 50 (5 Sept 2023):

Goal: Capstone Project - Gender Classification - LR, DT, RF, SVM and KNN

Gender Classification Project:
- Data Pre-Processing
- Implementation:
  - Logistic Regression
  - Decision Tree
  - Random Forest
  - SVM Classification
  - KNN Classification
- Model Evaluation:
  1. Accuracy Score
  2. Confusion Matrix
  3. Classification Report
- Kaggle Notebook : Link to Notebook
- Comparison of Model Performances (Bar Chart)

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 51 (6 Sept 2023):

Goal: Introduction to Cross-Validation

Introduction to Cross-Validation
What is Cross Validation
Why is Cross Validation Important
Advantages of Cross Validation
Limitations of Cross Validation
Types of Cross-Validation:
- Leave-One-Out Cross-Validation (LOOCV)
- Leave-P-Out Cross Validation (LPOCV)
- K-Fold Cross-Validation
- Stratified K-Fold Cross-Validation

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 52 (7 Sept 2023):

Goal: Cross-Validation Implementation

K-Fold Cross-Validation : Kaggle Notebook
Stratified K-Fold Cross-Validation : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 53 (8 Sept 2023):

Goal: Perform EDA Operation on Different Datasets

EDA on Walmart Dataset : Kaggle Notebook
EDA on Fifa 19 Dataset : Kaggle Notebook
EDA on Restaurant Dataset : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 54 (9 Sept 2023):

Goal: Introduction to Dimensionality Reduction

The Curse of Dimensionality
The Importance of Dimensionality Reduction
Dimensionality Reduction Techniques:
- Feature Selection
- Feature Extraction
- Dimension Reduction

Advantages of Dimensionality Reduction
Limitations of Dimensionality Reduction

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 55 (10 Sept 2023):

Goal: Introduction to Principal Component Analysis (PCA)

Some common terms used in PCA algorithm
Uses of PCA
Advantages of Principal Component Analysis
Limitations of Principal Component Analysis

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 56 (11 Sept 2023):

Goal: Steps in PCA (Principal Component Analysis)

Step 1 : Covariance Matrix Computation
Step 2 : Compute Eigenvalues and Eigenvectors of Covariance Matrix to Identify Principal Components

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 57 (12 Sept 2023):

Goal: Solve Example of PCA

Pre-processed Data
Calculated Covariance Matrix
Eigenvalues and Eigenvectors
Sorted Eigenvalues
Select Principal Components

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 58 (13 Sept 2023):

Goal: PCA Implementation using Scikit-Learn

Data Preparation
Importing Scikit-learn
Standardization
PCA Implementation
Explained Variance
Dimensionality Reduction
Visualization
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 59 (14 Sept 2023):

Goal: Introduction to Feature Selection

What is Feature Selection?
Why is Feature Selection Necessary?
Techniques in Feature Selection
- Univariate feature selection
- Feature importance from tree-based models
- Recursive Feature Elimination (RFE)
- L1-based feature selection
- Correlation-based feature selection
Steps in Feature Selection:
- Data Pre-Processing
- Feature Scoring
- Feature Selection
Advantages of Feature Selection:
- Improved model performance
- Faster training and prediction
- Enhanced model interpretability
- Reduced risk of overfitting
- Easier visualization of data
Limitations of Feature Selection:
- It may result in information loss.
- It can be challenging to decide which features to select.
- Some methods might not work well for all types of data.

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 60 (15 Sept 2023):

Goal: Feature Selection : Filter Methods

Introduction to Filter Methods
Steps in Filter Methods:
1. Data Pre-Processing
2. Feature Scoring
3. Feature Selection
Common Techniques in Filter Methods:
1. Correlation-based Feature Selection
2. Information Gain
3. Chi-square Test
4. Fisher's Score
5. Missing Value Ratio
Advantages of Filter Methods:
1. Simplicity
2. Speed
3. Independence
Limitations of Filter Methods:
1. Independence
2. Suboptimal Results
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 61 (16 Sept 2023):

Goal: Feature Selection : Wrapper Methods

Introduction to Wrapper Methods
Steps in Wrapper Methods:
1. Subset Selection
2. Model Building
3. Model Evaluation
Common Techniques in Wrapper Methods:
1. Forward Selection Method
2. Backward Elimination Method
3. Exhaustive Feature Selection Method
4. Recursive Feature Selection Method
Advantages of Wrapper Methods:
1. Optimal Features
2. Model-Specific
Limitations of Wrapper Methods:
1. Computationally Intensive
2. Model Dependency
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 62 (17 Sept 2023):

Goal: Feature Selection : Wrapper Methods

Introduction to Embedded Methods
Steps in Embedded Methods:
1. Feature Selection While Building
2. Model Training
3. Feature Importance Assessment
Common Techniques in Embedded Methods:
1. Random Forest Importance
2. Lasso (L1 Regularization)
3. Ridge (L2 Regularization)
4. Elastic Net (L1 and L2 Regularization)
Advantages of Embedded Methods:
1. Feature Relevance
2. Model Compatibility
Limitations of Embedded Methods:
1. Model Dependency
2. May Miss Correlations
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 63 (18 Sept 2023):

Goal: Exploratory Data Analysis (EDA) on IPL All Time Best Batsman Trending Dataset

Key EDA Operations Performed:
1. Data Loading
2. Data Exploration
3. Data Visualization
4. Statistical Insights
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 64 (19 Sept 2023):

Goal: Support Vector Regression (SVR) on Used Car Price Prediction

Key SVR Operations Performed:
1. Data Loading
2. Data Pre-processing
3. Feature Selection
4. Splitting Data
5. SVR Model Building
6. Model Training
7. Model Evaluation
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 65 (20 Sept 2023):

Goal: Movie Recommendations Using Collaborative Filtering

Key Operations Performed:
1. Data Loading
2. Data Pre-processing
3. Collaborative Filtering
4. Movie Recommendations
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 66 (21 Sept 2023):

Goal: Simple Linear Regression for Insurance Predictions

Key Operations Performed:
1. Data Loading
2. Data Exploration
3. Linear Regression Implementation
4. Model Evaluation
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 67 (22 Sept 2023):

Goal: Simple Linear Regression for Salary Predictions

Key Operations Performed:
1. Data Loading
2. Data Exploration
3. Linear Regression Implementation
4. Model Evaluation
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 68 (23 Sept 2023):

Goal: Exploratory Data Analysis (EDA) for Gym Exercises Data

Key Operations Performed:
1. Data Loading
2. Data Exploration
3. Data Visualization
4. Insights Extraction
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 69 (24 Sept 2023):

Goal: Exploratory Data Analysis (EDA) for Life Expectancy Data

Key Operations Performed:
1. Data Loading
2. Data Exploration
3. Data Visualization
4. Insights Extraction
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 70 (25 Sept 2023):

Goal: Exploratory Data Analysis (EDA) on Predicting Student Dropouts

Key Operations Performed:
1. Data Loading
2. Data Exploration
3. Data Visualization
4. Insights Extraction
Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 71 (26 Sept 2023):

Goal: Introduction to Clustering in ML

Intro to Clustering
Types of Clustering:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 72 (27 Sept 2023):

Goal: Clustering Algorithms in Machine Learning

Commonly Used Clustering Algorithms:
- K-means Algorithm
- Hierarchical Clustering
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- Agglomerative Clustering
- Gaussian Mixture Model (GMM)
Applications of Clustering:
- Customer Segmentation
- Image Compression
- Anomaly Detection
- Document Classification
Advantages of Clustering:
- Pattern Discovery
- Data Reduction
- Scalability
- Interpretability

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 73 (28 Sept 2023):

Goal: Implementing K-means Clustering

K-means Clustering:
- Initialization
- Assignment
- Update Centroids
- Repeat
Customer Clustering : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 74 (29 Sept 2023):

Goal: K-means Clustering Implementation

K-means Clustering:
- Initialization
- Assignment
- Update Centroids
- Repeat
Credit Card Clustering : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 75 (30 Sept 2023):

Goal: Visualizing Clusters Distribution for 30 Datasets

Visualize Clustering Exercises : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 76 (1 Oct 2023):

Goal: Hierarchical Clustering Implementation

Online Retail Clustering : Kaggle Notebook

GitHub Repository: Source Code

LinkedIn post: Daily Update

DAY 77 (2 Oct 2023):

Goal: Hierarchical Clustering Concepts

What Can We Achieve with Hierarchical Clustering:
- Hierarchical Insights
- Data Exploration
- Decision Support

GitHub Repository: Source Code

LinkedIn post: Daily Update

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
01. Day 1 - Python Basics		01. Day 1 - Python Basics
02. Day 2 - Functions and Modules		02. Day 2 - Functions and Modules
03. Day 3 - Built-in Data Structures		03. Day 3 - Built-in Data Structures
04. Day 4 - File Handling and Exception Handling		04. Day 4 - File Handling and Exception Handling
05. Day 5 - Python Classes and Objects		05. Day 5 - Python Classes and Objects
06. Day 6 - OOPs Concepts Implementation in Python		06. Day 6 - OOPs Concepts Implementation in Python
07. Day 7 - Advanced Topics		07. Day 7 - Advanced Topics
08. Day 8 - Python MySQL		08. Day 8 - Python MySQL
09. Day 9 - Day 1 of Project/bank_mangment		09. Day 9 - Day 1 of Project/bank_mangment
10. Day 10 - Day 2 of Project		10. Day 10 - Day 2 of Project
11. Day 11 - Project Completion		11. Day 11 - Project Completion
12. Day 12 - NumPy Basics		12. Day 12 - NumPy Basics
13. Day 13 - Numpy Math Operations		13. Day 13 - Numpy Math Operations
14. Day 14 - NumPy Statistic Functions		14. Day 14 - NumPy Statistic Functions
15. Day 15 - (1-8-23) Pandas Introduction		15. Day 15 - (1-8-23) Pandas Introduction
16. Day 16 - Pandas Data Manipulation		16. Day 16 - Pandas Data Manipulation
17. Day 17 - Pandas Data Cleaning		17. Day 17 - Pandas Data Cleaning
18. Day 18 - Pandas Time Series, Feature Engg		18. Day 18 - Pandas Time Series, Feature Engg
19. Day 19 - Matplotlib Intro		19. Day 19 - Matplotlib Intro
20. Day 20 - Matplotlib - Scatter and Histogram		20. Day 20 - Matplotlib - Scatter and Histogram
21. Day 21 - Seaborn Intro		21. Day 21 - Seaborn Intro
22. Day 22 - Seaborn Visualization		22. Day 22 - Seaborn Visualization
23. Day 23 - Python Basics Revision		23. Day 23 - Python Basics Revision
24. Day 24 - NumPy Revision		24. Day 24 - NumPy Revision
25. Day 25 - Pandas Revision		25. Day 25 - Pandas Revision
26. Day 26 - AI and ML Fundamentals		26. Day 26 - AI and ML Fundamentals
27. Day 27 - ML Workflow		27. Day 27 - ML Workflow
28. Day 28 - Model Evaluation Technique		28. Day 28 - Model Evaluation Technique
29. Day 29 - Overfitting and Underfitting		29. Day 29 - Overfitting and Underfitting
30. Day 30 - Simple Linear Regression		30. Day 30 - Simple Linear Regression
31. Day 31 - Multiple Linear Regression		31. Day 31 - Multiple Linear Regression
32. Day 32 - Classification in ML		32. Day 32 - Classification in ML
33. Day 33 - Logistic Regression		33. Day 33 - Logistic Regression
34. Day 34 - Decision Tree Concept		34. Day 34 - Decision Tree Concept
35. Day 35 - Decision Tree Implementation		35. Day 35 - Decision Tree Implementation
36. Day 36 - Ensemble Methods		36. Day 36 - Ensemble Methods
37. Day 37 - Gradient Boosting		37. Day 37 - Gradient Boosting
38. Day 38 - AdaBoost and XGBoost		38. Day 38 - AdaBoost and XGBoost
39. Day 39 - Random Forests Introduction		39. Day 39 - Random Forests Introduction
40. Day 40 - Random Forests Implementation		40. Day 40 - Random Forests Implementation
41. Day 41 - Decision Tree and Random Forest Example		41. Day 41 - Decision Tree and Random Forest Example
42. Day 42 - SVM Intro		42. Day 42 - SVM Intro
43. Day 43 - Linear and Non-Linear SVM Implementation		43. Day 43 - Linear and Non-Linear SVM Implementation
44. Day 44 - SVM Regression Implementation		44. Day 44 - SVM Regression Implementation
45. Day 45 - KNN Introduction		45. Day 45 - KNN Introduction
46. Day 46 - (1 Sept 2023) KNN Implementation		46. Day 46 - (1 Sept 2023) KNN Implementation
47. Day 47 - KNN Hyperparameter Tuning		47. Day 47 - KNN Hyperparameter Tuning
48. Day 48 - ML Fundamentals Revision		48. Day 48 - ML Fundamentals Revision
49. Day 49 - Capstone Project - 5G Resources - MLR, SVR, KNN_R		49. Day 49 - Capstone Project - 5G Resources - MLR, SVR, KNN_R
50. Day 50 - Capstone Project - Gender Classification - LR, DT, RF, SVM and KNN		50. Day 50 - Capstone Project - Gender Classification - LR, DT, RF, SVM and KNN
51. Day 51 - Intro to Cross Validation		51. Day 51 - Intro to Cross Validation
52. Day 52 - Cross Validation Implementation		52. Day 52 - Cross Validation Implementation
53. Day 53 - Perform EDA Operation		53. Day 53 - Perform EDA Operation
54. Day 54 - Dimensionality Reduction Intro		54. Day 54 - Dimensionality Reduction Intro
55. Day 55 - Intro to PCA		55. Day 55 - Intro to PCA
56. Day 56 - Step in PCA		56. Day 56 - Step in PCA
57. Day 57 - PCA Solved Example		57. Day 57 - PCA Solved Example
58. Day 58 - PCA Implementation		58. Day 58 - PCA Implementation
59. Day 59 - Feature Selection Intro		59. Day 59 - Feature Selection Intro
60. Day 60 - Feature Selection - Filter Methods		60. Day 60 - Feature Selection - Filter Methods
61. Day 61 - Feature Selection - Wrapper Methods		61. Day 61 - Feature Selection - Wrapper Methods
62. Day 62 - Feature Selection - Embedded Methods		62. Day 62 - Feature Selection - Embedded Methods
63. Day 63 - EDA on IPL Dataset		63. Day 63 - EDA on IPL Dataset
64. Day 64 - Used Car Price Prediction using SVR		64. Day 64 - Used Car Price Prediction using SVR
65. Day 65 - Movies Recommendation		65. Day 65 - Movies Recommendation
66. Day 66 - SLR on Insurance Dataset		66. Day 66 - SLR on Insurance Dataset
67. Day 67 - Linear Regression Salary Dataset		67. Day 67 - Linear Regression Salary Dataset
68. Day 68 - EDA on Gym Exercise Dataset		68. Day 68 - EDA on Gym Exercise Dataset
69. Day 69 - EDA on Life Expectations Dataset		69. Day 69 - EDA on Life Expectations Dataset
70. Day 70 - EDA on Student Dropout		70. Day 70 - EDA on Student Dropout
71. Day 71 - Intro to Clustering		71. Day 71 - Intro to Clustering
72. Day 72 - Clustering Algorithms		72. Day 72 - Clustering Algorithms
73. Day 73 - K-means Implementation		73. Day 73 - K-means Implementation
74. Day 74 - K-Means Credit Card Clustering		74. Day 74 - K-Means Credit Card Clustering
75. Day 75 - Visualize Clusters Exercise		75. Day 75 - Visualize Clusters Exercise
76. Day 76 - Hierarchical Clustering		76. Day 76 - Hierarchical Clustering
77. Day 77 - Hierarchical Clustering		77. Day 77 - Hierarchical Clustering
README.md		README.md
data_science.jpg		data_science.jpg

mankarsnehal/100-Days-of-Code-Data-Science

Folders and files

Latest commit

History

Repository files navigation

100 Days of Data Science Code

Articles Published on LinkedIn

Calendar Progress

July 2023

August 2023

September 2023

October 2023

100 Days of Data Science Code Day-to-Day Progress

DAY 1 (18 July 2023):

Goal: Python Basics

DAY 2 (19 July 2023):

Goal: Functions and Modules

DAY 3 (20 July 2023):

Goal: Data Structures

DAY 4 (21 July 2023):

Goal: File Handling and Exception Handling

DAY 5 (22 July 2023):

Goal: Python Classes and Objects

DAY 6 (23 July 2023):

Goal: Python OOPs Concepts and Implementation in Python

DAY 7 (24 July 2023):

Goal: Advanced Python Concepts

DAY 8 (25 July 2023):

Goal: Python Connectivity with MySQL Database

DAY 9 (26 July 2023):

Goal: Day 1 of Bank Management System

DAY 10 (27 July 2023):

Goal: Day 2 of Bank Management System

DAY 11 (28 July 2023):

Goal: Final Day of Project (Transfer Operations and Final Testing)

DAY 12 (29 July 2023):

Goal: NumPy Basics and Array Manipulation

DAY 13 (30 July 2023):

Goal: Mathematical Operations with NumPy

DAY 14 (31 July 2023):

Goal: Statistics Functions with NumPy

DAY 15 (1 Aug. 2023):

Goal: Introduction to Pandas and Data Structures in Pandas

DAY 16 (2 Aug. 2023):

Goal: Data Manipulation and Data Aggregation using Pandas

DAY 17 (3 Aug. 2023):

Goal: Data Cleaning

DAY 18 (4 Aug. 2023):

Goal: Feature Engineering and Time Series Analysis

DAY 19 (5 Aug. 2023):

Goal: Matplotlib Introduction and Line Plots

DAY 20 (6 Aug. 2023):

Goal: Matplotlib Scatter Plot and Histogram

DAY 21 (7 Aug. 2023):

Goal: Seaborn Introduction

DAY 22 (8 Aug. 2023):

Goal: Seaborn Visualization Plots - Relational and Regression Plots

DAY 23 (9 Aug. 2023):

Goal: Python Fundamentals Notes

DAY 24 (10 Aug. 2023):

Goal: Python Fundamentals Notes

DAY 25 (11 Aug 2023):

Goal: Pandas Revision

DAY 26 (12 Aug 2023):

Goal: Introduction to Artificial Intelligence and Machine Learning Fundamentals

DAY 27 (13 Aug 2023):

Goal: Understanding Machine Learning Workflow

DAY 28 (14 Aug 2023):

Goal: Model Evaluation Techniques in Machine Learning

DAY 29 (15 Aug 2023):

Goal: Diagnosing and Addressing Underfitting and Overfitting

DAY 30 (16 Aug 2023):

Goal: Simple Linear Regression Implementation

DAY 31 (17 Aug 2023):

Goal: Multiple Linear Regression and Implementation using Student Performance Analysis

DAY 32 (18 Aug 2023):

Goal: Classification in Machine Learning

DAY 33 (19 Aug 2023):

Goal: Logistic Regression Implementation

DAY 34 (20 Aug 2023):