This page has moved here. Check that repo for updates
- Have Anaconda (conda) installed
- git-clone this repository
- Open and run the Lecture_1.ipynb notebook
- Learn numpy
- Read this website about the diabetes dataset
- and this website sklearn's Linear Regression
- Add one or more new features from the dataset, and solve using sklearn's LinearRegression
- Set up the matrix equation for linear regression with two features
- Solve the linear system and make sure the results are the same as those you got with sklearn
- Run the document classification Naive Bayes example.
- Play around with it - try with more than the two classes used in the lecture.
- Make up a simple classification problem, write code to generate synthetic data, and train a naive bayes classifier on your data with sklearn.
- Make up a different classification problem, and design it so that naive bayes classifier performs poorly. (Hint: make data that violates the assumption that naive bayes makes).
- Derive Bayes rule. (Hint: start with the equation relating joint distribution to a conditional distribution)
- Spot and understand the "mistake" on the Intuition -> math -> stats slide
- Work through a few examples of 2x2 matrix - vector multiplications by hand. Compare to results you get in code.
- if AB=I, then is it the case that BA=I?
- What is the solution to the 2x2 matrix equation in the slides?
- Go to the end of Lecture_1.ipynb and find the slides / cells that we didn't go over in the lecture.
- Run the sklearn code that uses Linear regression to fit a polynomial of degree-3.
- Write down the matrix equation (linear system) for
- Write code to generate the matrices and vectors for the equation you wrote down in (3).
- Solve it yourself in code with
np.linalg.lstsq
and make sure you get the same answer as sklearn.