Skip to content

Magho/ML-Washington-specialization-coursera

Repository files navigation

Master Machine Learning topics

Instructors:

Goals

  • introduction to the exciting, high-demand field of Machine Learning
  • gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval
  • learn to analyze large and complex datasets, create systems that adapt and improve over time
  • build intelligent applications that can make predictions from data

Courses

  • Week 1 - Welcome

  • Week 2 - Regression: Predicting House Prices

    • Learning Objectives

      • Describe the input (features) and output (real-valued predictions) of a regression model
      • Calculate a goodness-of-fit metric (e.g., RSS)
      • Estimate model parameters by minimizing RSS (algorithms to come...)
      • Exploit the estimated model to form predictions
      • Perform a training/test split of the data
      • Analyze performance of various regression models in terms of test error
      • Use test error to avoid overfitting when selecting amongst candidate models
      • Describe a regression model using multiple features
      • Describe other applications where regression is useful
    • Note - regression-intro

  • Week 3 - Classification: Analyzing Sentiment

    • Learning Objectives

      • Identify a classification problem and some common applications
      • Describe decision boundaries and linear classifiers
      • Train a classifier
      • Measure its error
      • Some rules of thumb for good accuracy
      • Interpret the types of error associated with classification
      • Describe the tradeoffs between model bias and data set size
      • Use class probability to express degree of confidence in prediction
    • Note - classification

  • Week 4 - Clustering and Similarity: Retrieving Documents

    • Learning Objectives

      • Describe ways to represent a document (e.g., raw word counts, tf-idf,...)
      • Measure the similarity between two documents
      • Discuss issues related to using raw word counts
      • Normalize counts to adjust for document length
      • Emphasize important words using tf-idf
      • Implement a nearest neighbor search for document retrieval
      • Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
      • Determine whether a task is supervised or unsupervised
      • Cluster documents using k-means (algorithmic details to come...)
      • Describe other applications of clustering
    • Note - clustering-intro

  • Week 5 - Recommending Products

    • Learning Objectives

      • Describe the goal of a recommender system
      • Provide examples of applications where recommender systems are useful
      • Implement a co-occurrence based recommender system
      • Describe the input (observations, number of “topics”) and output (“topic”
      • vectors, predicted values) of a matrix factorization model
      • Exploit estimated “topic” vectors (algorithms to come...) to make recommendations
      • Describe the cold-start problem and ways to handle it (e.g., incorporating features)
      • Analyze performance of various recommender systems in terms of precision and recall
      • Use AUC or precision-at-k to select amongst candidate algorithms
    • Note - recommenders-intro

  • Week 6 - Deep Learning: Searching for Images

    • Learning Objectives

      • Describe multi-layer neural network models
      • Interpret the role of features as local detectors in computer vision
      • Relate neural networks to hand-crafted image features
      • Describe some settings where deep learning achieves significant performance boosts
      • State the pros & cons of deep learning model
      • Apply the notion of transfer learning
      • Use neural network models trained in one domain as features for building a model in another domain
      • Build an image retrieval tool using deep features
    • Note 1 - deeplearning

    • Note 2 - closing

  • Week 1 - Welcome

    • Learning Objectives

      • Describe the input (features) and output (real-valued predictions) of a regression model
      • Calculate a goodness-of-fit metric (e.g., RSS)
      • Estimate model parameters to minimize RSS using gradient descent
      • Interpret estimated model parameters
      • Exploit the estimated model to form predictions
      • Discuss the possible influence of high leverage points
      • Describe intuitively how fitted line might change when assuming different goodness-of-fit metrics
    • Note 1 - intor for the course

    • Note 2 - simple regression

  • Week 2 - Multiple Regression

    • Learning Objectives

      • Describe polynomial regression
      • Detrend a time series using trend and seasonal components
      • Write a regression model using multiple inputs or features thereof
      • Cast both polynomial regression and regression with multiple inputs as regression with multiple features
      • Calculate a goodness-of-fit metric (e.g., RSS)
      • Estimate model parameters of a general multiple regression model to minimize RSS:
      • In closed form
      • Using an iterative gradient descent algorithm
      • Interpret the coefficients of a non-featurized multiple regression fit
      • Exploit the estimated model to form predictions
      • Explain applications of multiple regression beyond house price modeling
    • Note - multiple regression

  • Week 3 - Assessing Performance

    • Learning Objectives

      • Describe what a loss function is and give examples
      • Contrast training, generalization, and test error
      • Compute training and test error given a loss function
      • Discuss issue of assessing performance on training set
      • Describe tradeoffs in forming training/test splits
      • List and interpret the 3 sources of avg. prediction error
      • Irreducible error, bias, and variance
      • Discuss issue of selecting model complexity on test data
      • and then using test error to assess generalization error
      • Motivate use of a validation set for selecting tuning
      • parameters (e.g., model complexity)
      • Describe overall regression workflow
    • Note - assessing performance

  • Week 4 - Ridge Regression

    • Learning Objectives

      • Describe what happens to magnitude of estimated coefficients when model is overfit
      • Motivate form of ridge regression cost function
      • Describe what happens to estimated coefficients of ridge regression as tuning parameter λ is varied
      • Interpret coefficient path plot
      • Estimate ridge regression parameters:
      • In closed form
      • Using an iterative gradient descent algorithm
      • Implement K-fold cross validation to select the ridge regression tuning parameter λ
    • Note - rideg regression

  • Week 5 - Feature Selection & Lasso

    • Learning Objectives

      • Perform feature selection using “all subsets” and “forward stepwise” algorithms
      • Analyze computational costs of these algorithms
      • Contrast greedy and optimal algorithms
      • Formulate lasso objective
      • Describe what happens to estimated lasso coefficients as tuning parameter λ is varied
      • Interpret lasso coefficient path plot
      • Contrast ridge and lasso regression
      • Describe geometrically why L1 penalty leads to sparsity
      • Estimate lasso regression parameters using an iterative coordinate descent algorithm
      • Implement K-fold cross validation to select lasso tuning parameter λ
    • Note - lasso regression

  • Week 6 - Nearest Neighbors & Kernel Regression

    • Learning Objectives

      • Motivate the use of nearest neighbor (NN) regression
      • Define distance metrics in 1D and multiple dimensions
      • Perform NN and k-NN regression
      • Analyze computational costs of these algorithms
      • Discuss sensitivity of NN to lack of data, dimensionality, and noise
      • Perform weighted k-NN and define weights using a kernel
      • Define and implement kernel regression
      • Describe the effect of varying the kernel bandwidth λ or # of nearest neighbors k
      • Select λ or k using cross validation
      • Compare and contrast kernel regression with a global average fit
      • Define what makes an approach nonparametric and why NN and
      • kernel regression are considered nonparametric methods
      • Analyze the limiting behavior of NN regression
      • Use NN for classification
    • Note 1 - kernel regression

    • Note 2 - summary

  • Week 1 - Welcome

    • Learning Objectives

      • Describe decision boundaries and linear classifiers
      • Use class probability to express degree of confidence in prediction
      • Define a logistic regression model
      • Interpret logistic regression outputs as class probabilities
      • Describe impact of coefficient values on logistic regression output
      • Use 1-hot encoding to represent categorical inputs
      • Perform multiclass classification using the 1-versus-all approach
    • Note 1 - intor for the course

    • Note 2 - logistic-regression-model

  • Week 2 - Learning Linear Classifiers

    • Learning Objectives

      • Identify when overfitting is happening
      • Relate large learned coefficients to overfitting
      • Describe the impact of overfitting on decision boundaries and predicted probabilities of linear classifiers
      • Motivate the form of L 2 regularized logistic regression quality metric
      • Describe what happens to estimated coefficients as tuning parameter λ is varied
      • Interpret coefficient path plot
      • Estimate L 2 regularized logistic regression coefficients using gradient ascent
      • Describe the use of L 1 regularization to obtain sparse logistic regression solutions
    • Note - 2.1_logistic-regression-learning

    • Note - 2.2_logistic-regression-learning

  • Week 3 - Decision Trees

    • Learning Objectives

      • Define a decision tree classifier
      • Interpret the output of a decision trees
      • Learn a decision tree classifier using greedy algorithm
      • Traverse a decision tree to make predictions
      • Majority class predictions
      • Probability predictions
      • Multiclass classification
    • Note - decision-trees

  • Week 4 - Preventing Overfitting in Decision Trees

    • Learning Objectives

      • Identify when overfitting in decision trees
      • Prevent overfitting with early stopping
      • Limit tree depth
      • Do not consider splits that do not reduce classification error
      • Do not split intermediate nodes with only few points
      • Prevent overfitting by pruning complex trees
      • Use a total cost formula that balances classification error and tree complexity
      • Use total cost to merge potentially complex trees into simpler ones
      • Describe common ways to handling missing data:
        • Skip all rows with any missing values
        • Skip features with many missing values
        • Impute missing values using other data points
      • Modify learning algorithm (decision trees) to handle missing data:
        • Missing values get added to one branch of split
        • Use classification error to determine where missing values go
    • Note - 4.1_decision-trees-overfitting

    • Note - 4.2_decision-trees-overfitting

  • Week 5 - Boosting

    • Learning Objectives

      • Identify notion ensemble classifiers
      • Formalize ensembles as the weighted combination of simpler classifiers
      • Outline the boosting framework – sequentially learn classifiers on weighted data
      • Describe the AdaBoost algorithm
      • Learn each classifier on weighted data
      • Compute coefficient of classifier
      • Recompute data weights
      • Normalize weights
      • Implement AdaBoost to create an ensemble of decision stumps
      • Discuss convergence properties of AdaBoost & how to pick the maximum number of iterations T
    • Note - Boosting

  • Week 6 - Precision-Recall

    • Learning Objectives

      • Classification accuracy/error are not always right metrics
      • Precision captures fraction of positive predictions that are correct
      • Recall captures fraction of positive data correctly identified by the model
      • Trade-off precision & recall by setting probability thresholds
      • Plot precision-recall curves.
      • Compare models by computing precision at k
    • Note 1 - precision-recall

  • Week 7 - Scaling to Huge Datasets & Online Learning

    • Learning Objectives

      • Significantly speedup learning algorithm using stochastic gradient
      • Describe intuition behind why stochastic gradient works
      • Apply stochastic gradient in practice
      • Describe online learning problems
      • Relate stochastic gradient to online learning
    • Note 1 - online-learning

  • Week 1 - Welcome

  • Week 2 - Nearest Neighbor Search

    • Learning Objectives

      • Implement nearest neighbor search for retrieval tasks
      • Contrast document representations (e.g., raw word counts, tf-idf,...)
      • Emphasize important words using tf-idf
      • Contrast methods for measuring similarity between two documents
        • Euclidean vs. weighted Euclidean
        • Cosine similarity vs. similarity via unnormalized inner product
      • Describe complexity of brute force search
      • Implement KD-trees for nearest neighbor search
      • Implement LSH for approximate nearest neighbor search
      • Compare pros and cons of KD-trees and LSH, and decide
      • which is more appropriate for given dataset
    • Note - retrieval-intro

  • Week 3 - Clustering with k-means

    • Learning Objectives

      • Describe potential applications of clustering
      • Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
      • Determine whether a task is supervised or unsupervised
      • Cluster documents using k-means
      • Interpret k-means as a coordinate descent algorithm
      • Define data parallel problems
      • Explain Map and Reduce steps of MapReduce framework
      • Use existing MapReduce implementations to parallelize k- means, understanding what’s being done under the hood
    • Note - kmeans

  • Week 4 - Mixture Models

    • Learning Objectives

      • Interpret a probabilistic model-based approach to clustering using mixture models
      • Describe model parameters
      • Motivate the utility of soft assignments and describe what they represent
      • Discuss issues related to how the number of parameters grow with the number of dimensions
        • Interpret diagonal covariance versions of mixtures of Gaussians
      • Compare and contrast mixtures of Gaussians and k-means
      • Implement an EM algorithm for inferring soft assignments and cluster parameters
        • Determine an initialization strategy
        • Implement a variant that helps avoid overfitting issues
    • Note - mixmodel-EM

  • Week 5 - Mixed Membership Modeling via Latent Dirichlet Allocation

    • Learning Objectives

      • Compare and contrast clustering and mixed membership models
      • Describe a document clustering model for the bag- of-words doc representation
      • Interpret the components of the LDA mixed membership model
      • Analyze a learned LDA model
        • Topics in the corpus
        • Topics per document
      • Describe Gibbs sampling steps at a high level
      • Utilize Gibbs sampling output to form predictions or estimate model parameters
      • Implement collapsed Gibbs sampling for LDA
    • Note - LDA

  • Week 6 - Hierarchical Clustering & Closing Remarks

License

License These assignments are under MIT license Copyright (c) 2018 Mohamed el-Maghraby LICENSE.md

Each week contain it's assignment and data

About

Machine Learning Specialization on Coursera

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published