shhossain · shhossain · Sep 20, 2024 · Sep 20, 2024 · Sep 20, 2024
diff --git a/Data Science/readme.md b/Data Science/readme.md
@@ -20,10 +20,46 @@ EDA involves using statistical and visualization techniques to understand the da
 
 Feature engineering is the process of creating new features or modifying existing ones to improve model performance.
 
-### Machine Learning and Modeling
+## Machine Learning and Modeling
 
 Data Scientists build predictive models using machine learning algorithms. This involves splitting the data into training and testing sets, model selection, training, and evaluation.
 
+### Machine Learning Models
+
+There are two main types of models used for analyzing patterns and predicting behavior in data: Supervised learning and Unsupervised learning models. 
+
+The supervised models contain data that come with associated labels for identification. Unsupervised models, on the other hand, do not have labels and instead rely on deep learning methods that read patterns from groups of data and distinguish different elements and commonalities accordingly. 
+
+Through the methods of regression and classification, behaviors can be extracted from training datasets and tested on smaller datasets to predict a previously unknown value.
+
+#### KNN - K Nearest Neighbor
+
+The **KNN** model is one of the most straightforward, although relatively inefficient, supervised learning models. It uses *proximity* to classify a data point. After the training data is labeled with their class or numerical value, the distance is calculated and the algorithm identifies the **k** closest labeled examples to the queried data point. It then returns the *plurality* label of the k closest examples.
+
+#### Decision Trees
+
+A **Decision tree** is another supervised machine learning algorithm that relies on the relationship between nodes with dataset *feature values*. These work by splitting data into subsets based on feature values in a way that stems down and creates a tree-like visualization. Each node represents a decision point while each branch that stems from a node is that decision made. 
+
+A node with no branches holds the final outcome and is referred to as a *leaf node*; the node with no branches leading to it is the *root node* and is the starting decision. While can handle both numerical and categorical data and are fairly easy to visualize, they are prone to overfitting when not regulated properly.
+
+#### Linear Regression
+
+**Linear regression** is one of the most fundamental methods used for modeling a relationship between two or more variables, attempting to express that relationship as a linear equation.
+
+This method determines the coefficients that best minimize the *sum squared difference* between the observed values and the values predicted by a model. It is very handy for interpreting trends within data and ultimately making predictions.
+
+#### Clustering
+
+Another relatively straightforward algorithm and close cousin of the KNN is **clustering**, particularly the *k-means method*. This unsupervised machine learning model finds commonalities among subsets in data based on their characteristics without any previously established labels or categories. 
+
+There are other algorithms that work similarly, like the hierarchical clustering and DBSCAN methods, that approach the process of measuring similarity between data points differently. 
+
+#### Neural Networks
+
+One of the more commonly used methods today are deep learning **neural network models**. These machine learning models, inspired by the function and structure of the brain, consist of interconnected layers of nodes with their own associated weights. Data passes through these layers, transforming as they pass through mathematical functions, and eventually extracting complex patterns and predictions from datasets. 
+
+These algorithms work* particularly well for *large and high-dimensional data*, like image and speech recognition as well as language processing. A method called *backpropagation* is needed to adjust the weights based on the error of the networks predictions and with enough iterations and training, neural networks can produce amazing levels of accuracy in their predictions.
+
 ## Data Visualization
 
 Data visualization is crucial for communicating insights effectively. It uses charts, graphs, and plots to represent data visually. Common types of data visualizations include: