One of the most popular parametric linear models is logistic regression. Although designed for regression problems with the output of continuous values in [0, 1], a little trick (interpreting continuous values as posterior class probabilities) makes it one of the most useful tools available to CV/ML engineers for building a strong baseline before delving into deep learning.
Let’s assume we are interested in using logistic regression to classify a set of observations into two classes (binary classification), f.ex - if an email is spam or not. For this exercise we use the Breast Cancer Dataset. You can easily load and use this dataset from the scikit-learn python package as follows.
For our coding challenge, we are interested in learning parameters of a logistic regression model on the Breast Cancer Dataset. Alongside this doc you’d find our bare-bone implementation of logistic regression. In our implementation, we intend to train our model with Stochastic Gradient Descent + log-loss and get comparable performance to its better known public implementation.
To test the correctness of your implementation we use publicly available SGDClassifier as a strong baseline to provide guidelines on the expected accuracy. NOTE
: We don’t expect your implementation to outperform SGDClassifier (we won’t complain if it does 😄).
- Modify the arguments for
SGDClassifier
to fully support a linear model. - These modifications would depend on your implementation of
__compute_loss
- Derive gradient updates for weights and bias value for the model from scratch
- You can do it on a sheet of paper and send us a photo
As a coding task we’d like you to implement the following function(s)
clean_data
- Remove any noise in the training data with heuristics
fit
- Given features and ground truth labels
- Loop of data for epocs / iterations
- Build random mini-batch
- Compute log-loss using
__compute_loss
below - Compute gradients given log-loss, mini-batch
- Updated the weight (
self.w
) and bias (self.b
) of the model
predict
- Given samples predict the labels with trained weights and bias
- Currently we have set predict to assign random labels
__compute_loss
- Compute loss over a batch
- Currently we set it to hard coded value (0.0)
__compute_gradient
- Compute gradient give loss/batch
- Currently set as zero (No updates)
- Correctness : Does your code do the right thing
- Objective: Comment on your choice of loss function
- Convergence: Does the Loss decrease with the number of iterations
- Blind baseline: Is your classifier better than a random classifier
- Is minimizing the loss the best criteria to perform early stopping ?
- Does the model guarantee performance on an unseen dataset ?
- How does
lr
andbatch_size
affect convergence ? - Bonus question :
a. Is it possible to modify the training data and learn just the weight vector ?
b. Add a function __dropout
, which randomly sets some of feature values to zero during training. How will you incorporate it during fit / predict ?
c. Does __dropout
help in convergence / overfitting ?