Master Machine Learning topics
- introduction to the exciting, high-demand field of Machine Learning
- gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval
- learn to analyze large and complex datasets, create systems that adapt and improve over time
- build intelligent applications that can make predictions from data
-
Week 2 - Regression: Predicting House Prices
-
Learning Objectives
- Describe the input (features) and output (real-valued predictions) of a regression model
- Calculate a goodness-of-fit metric (e.g., RSS)
- Estimate model parameters by minimizing RSS (algorithms to come...)
- Exploit the estimated model to form predictions
- Perform a training/test split of the data
- Analyze performance of various regression models in terms of test error
- Use test error to avoid overfitting when selecting amongst candidate models
- Describe a regression model using multiple features
- Describe other applications where regression is useful
-
-
Week 3 - Classification: Analyzing Sentiment
-
Learning Objectives
- Identify a classification problem and some common applications
- Describe decision boundaries and linear classifiers
- Train a classifier
- Measure its error
- Some rules of thumb for good accuracy
- Interpret the types of error associated with classification
- Describe the tradeoffs between model bias and data set size
- Use class probability to express degree of confidence in prediction
-
-
Week 4 - Clustering and Similarity: Retrieving Documents
-
Learning Objectives
- Describe ways to represent a document (e.g., raw word counts, tf-idf,...)
- Measure the similarity between two documents
- Discuss issues related to using raw word counts
- Normalize counts to adjust for document length
- Emphasize important words using tf-idf
- Implement a nearest neighbor search for document retrieval
- Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
- Determine whether a task is supervised or unsupervised
- Cluster documents using k-means (algorithmic details to come...)
- Describe other applications of clustering
-
-
Week 5 - Recommending Products
-
Learning Objectives
- Describe the goal of a recommender system
- Provide examples of applications where recommender systems are useful
- Implement a co-occurrence based recommender system
- Describe the input (observations, number of “topics”) and output (“topic”
- vectors, predicted values) of a matrix factorization model
- Exploit estimated “topic” vectors (algorithms to come...) to make recommendations
- Describe the cold-start problem and ways to handle it (e.g., incorporating features)
- Analyze performance of various recommender systems in terms of precision and recall
- Use AUC or precision-at-k to select amongst candidate algorithms
-
-
Week 6 - Deep Learning: Searching for Images
-
Learning Objectives
- Describe multi-layer neural network models
- Interpret the role of features as local detectors in computer vision
- Relate neural networks to hand-crafted image features
- Describe some settings where deep learning achieves significant performance boosts
- State the pros & cons of deep learning model
- Apply the notion of transfer learning
- Use neural network models trained in one domain as features for building a model in another domain
- Build an image retrieval tool using deep features
-
-
-
Learning Objectives
- Describe the input (features) and output (real-valued predictions) of a regression model
- Calculate a goodness-of-fit metric (e.g., RSS)
- Estimate model parameters to minimize RSS using gradient descent
- Interpret estimated model parameters
- Exploit the estimated model to form predictions
- Discuss the possible influence of high leverage points
- Describe intuitively how fitted line might change when assuming different goodness-of-fit metrics
-
-
-
Learning Objectives
- Describe polynomial regression
- Detrend a time series using trend and seasonal components
- Write a regression model using multiple inputs or features thereof
- Cast both polynomial regression and regression with multiple inputs as regression with multiple features
- Calculate a goodness-of-fit metric (e.g., RSS)
- Estimate model parameters of a general multiple regression model to minimize RSS:
- In closed form
- Using an iterative gradient descent algorithm
- Interpret the coefficients of a non-featurized multiple regression fit
- Exploit the estimated model to form predictions
- Explain applications of multiple regression beyond house price modeling
-
-
Week 3 - Assessing Performance
-
Learning Objectives
- Describe what a loss function is and give examples
- Contrast training, generalization, and test error
- Compute training and test error given a loss function
- Discuss issue of assessing performance on training set
- Describe tradeoffs in forming training/test splits
- List and interpret the 3 sources of avg. prediction error
- Irreducible error, bias, and variance
- Discuss issue of selecting model complexity on test data
- and then using test error to assess generalization error
- Motivate use of a validation set for selecting tuning
- parameters (e.g., model complexity)
- Describe overall regression workflow
-
-
-
Learning Objectives
- Describe what happens to magnitude of estimated coefficients when model is overfit
- Motivate form of ridge regression cost function
- Describe what happens to estimated coefficients of ridge regression as tuning parameter λ is varied
- Interpret coefficient path plot
- Estimate ridge regression parameters:
- In closed form
- Using an iterative gradient descent algorithm
- Implement K-fold cross validation to select the ridge regression tuning parameter λ
-
-
Week 5 - Feature Selection & Lasso
-
Learning Objectives
- Perform feature selection using “all subsets” and “forward stepwise” algorithms
- Analyze computational costs of these algorithms
- Contrast greedy and optimal algorithms
- Formulate lasso objective
- Describe what happens to estimated lasso coefficients as tuning parameter λ is varied
- Interpret lasso coefficient path plot
- Contrast ridge and lasso regression
- Describe geometrically why L1 penalty leads to sparsity
- Estimate lasso regression parameters using an iterative coordinate descent algorithm
- Implement K-fold cross validation to select lasso tuning parameter λ
-
-
Week 6 - Nearest Neighbors & Kernel Regression
-
Learning Objectives
- Motivate the use of nearest neighbor (NN) regression
- Define distance metrics in 1D and multiple dimensions
- Perform NN and k-NN regression
- Analyze computational costs of these algorithms
- Discuss sensitivity of NN to lack of data, dimensionality, and noise
- Perform weighted k-NN and define weights using a kernel
- Define and implement kernel regression
- Describe the effect of varying the kernel bandwidth λ or # of nearest neighbors k
- Select λ or k using cross validation
- Compare and contrast kernel regression with a global average fit
- Define what makes an approach nonparametric and why NN and
- kernel regression are considered nonparametric methods
- Analyze the limiting behavior of NN regression
- Use NN for classification
-
-
-
Learning Objectives
- Describe decision boundaries and linear classifiers
- Use class probability to express degree of confidence in prediction
- Define a logistic regression model
- Interpret logistic regression outputs as class probabilities
- Describe impact of coefficient values on logistic regression output
- Use 1-hot encoding to represent categorical inputs
- Perform multiclass classification using the 1-versus-all approach
-
-
Week 2 - Learning Linear Classifiers
-
Learning Objectives
- Identify when overfitting is happening
- Relate large learned coefficients to overfitting
- Describe the impact of overfitting on decision boundaries and predicted probabilities of linear classifiers
- Motivate the form of L 2 regularized logistic regression quality metric
- Describe what happens to estimated coefficients as tuning parameter λ is varied
- Interpret coefficient path plot
- Estimate L 2 regularized logistic regression coefficients using gradient ascent
- Describe the use of L 1 regularization to obtain sparse logistic regression solutions
-
-
-
Learning Objectives
- Define a decision tree classifier
- Interpret the output of a decision trees
- Learn a decision tree classifier using greedy algorithm
- Traverse a decision tree to make predictions
- Majority class predictions
- Probability predictions
- Multiclass classification
-
-
Week 4 - Preventing Overfitting in Decision Trees
-
Learning Objectives
- Identify when overfitting in decision trees
- Prevent overfitting with early stopping
- Limit tree depth
- Do not consider splits that do not reduce classification error
- Do not split intermediate nodes with only few points
- Prevent overfitting by pruning complex trees
- Use a total cost formula that balances classification error and tree complexity
- Use total cost to merge potentially complex trees into simpler ones
- Describe common ways to handling missing data:
- Skip all rows with any missing values
- Skip features with many missing values
- Impute missing values using other data points
- Modify learning algorithm (decision trees) to handle
missing data:
- Missing values get added to one branch of split
- Use classification error to determine where missing values go
-
-
-
Learning Objectives
- Identify notion ensemble classifiers
- Formalize ensembles as the weighted combination of simpler classifiers
- Outline the boosting framework – sequentially learn classifiers on weighted data
- Describe the AdaBoost algorithm
- Learn each classifier on weighted data
- Compute coefficient of classifier
- Recompute data weights
- Normalize weights
- Implement AdaBoost to create an ensemble of decision stumps
- Discuss convergence properties of AdaBoost & how to pick the maximum number of iterations T
-
-
-
Learning Objectives
- Classification accuracy/error are not always right metrics
- Precision captures fraction of positive predictions that are correct
- Recall captures fraction of positive data correctly identified by the model
- Trade-off precision & recall by setting probability thresholds
- Plot precision-recall curves.
- Compare models by computing precision at k
-
-
Week 7 - Scaling to Huge Datasets & Online Learning
-
Learning Objectives
- Significantly speedup learning algorithm using stochastic gradient
- Describe intuition behind why stochastic gradient works
- Apply stochastic gradient in practice
- Describe online learning problems
- Relate stochastic gradient to online learning
-
-
Week 1 - Welcome
-
Week 2 - Nearest Neighbor Search
-
Learning Objectives
- Implement nearest neighbor search for retrieval tasks
- Contrast document representations (e.g., raw word counts, tf-idf,...)
- Emphasize important words using tf-idf
- Contrast methods for measuring similarity between two
documents
- Euclidean vs. weighted Euclidean
- Cosine similarity vs. similarity via unnormalized inner product
- Describe complexity of brute force search
- Implement KD-trees for nearest neighbor search
- Implement LSH for approximate nearest neighbor search
- Compare pros and cons of KD-trees and LSH, and decide
- which is more appropriate for given dataset
-
-
Week 3 - Clustering with k-means
-
Learning Objectives
- Describe potential applications of clustering
- Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
- Determine whether a task is supervised or unsupervised
- Cluster documents using k-means
- Interpret k-means as a coordinate descent algorithm
- Define data parallel problems
- Explain Map and Reduce steps of MapReduce framework
- Use existing MapReduce implementations to parallelize k- means, understanding what’s being done under the hood
-
-
-
Learning Objectives
- Interpret a probabilistic model-based approach to clustering using mixture models
- Describe model parameters
- Motivate the utility of soft assignments and describe what they represent
- Discuss issues related to how the number of parameters
grow with the number of dimensions
- Interpret diagonal covariance versions of mixtures of Gaussians
- Compare and contrast mixtures of Gaussians and k-means
- Implement an EM algorithm for inferring soft
assignments and cluster parameters
- Determine an initialization strategy
- Implement a variant that helps avoid overfitting issues
-
-
Week 5 - Mixed Membership Modeling via Latent Dirichlet Allocation
-
Learning Objectives
- Compare and contrast clustering and mixed membership models
- Describe a document clustering model for the bag- of-words doc representation
- Interpret the components of the LDA mixed membership model
- Analyze a learned LDA model
- Topics in the corpus
- Topics per document
- Describe Gibbs sampling steps at a high level
- Utilize Gibbs sampling output to form predictions or estimate model parameters
- Implement collapsed Gibbs sampling for LDA
-
These assignments are under MIT license Copyright (c) 2018 Mohamed el-Maghraby LICENSE.md