Data Science Notebook on a Classification Task
In the Jupyter Notebook included in this page, we will be using the Census Income Dataset to predict whether an individual's income exceeds $50K/yr based on census data.
The Dataset can be found here:
The Notebook can be found here:
This Jupyter Notepad has a companion Mindmap/Cheatsheet that lists most of the Data Science steps that can be found at the following link:
In this Notebook, we'll perform:
- Feature Exploration (Uni and Bi-variate)
- Feature Imputation
- Feature Selection
- Feature Encoding
- Feature Ranking
- Machine Learning with sklearn and Tensorflow
- Random Search
- Accuracy, Precision, Recall, and f1 calculations
- ROC Curve
This Notebook has been designed to be run on top of the Jupyter Tensorflow Docker instance found in the link below:
If you haven't downloaded Docker at this point, please visit:
Then, open a shell or terminal session and copy/paste the following:
docker run -itd \
--restart always \
--name jupyter \
--hostname jupyter \
-p 8888:8888 \
-p 6006:6006 \
jupyter/tensorflow-notebook:latest \
start-notebook.sh --NotebookApp.token=''
Upon running the command, docker will automatically pull the images it needs and get the containers going for us.
Give it a minute or so for Jupyter to start, and head to the following URL: http://localhost:8888
You should now have Jupyter running. If after a minute you can't reach the URL, check that the containers are running correctly and the network has been created by typing:
### Check the containers are running
docker ps -a
Download it from this link:
Go back to:
- http://localhost:8888, load your Notebook into Jupyter and run it. That's it!
Here are a few useful commands in case something goes wrong with your docker instance:
# Restart Jupyter Docker Container
docker restart jupyter
# Stop Jupyter Docker Container
docker stop jupyter
# Remove Jupyter Docker Container
docker rm jupyter
Feature Exploration (Uni and Bi-variate) Feature Imputation Feature Selection Feature Encoding Feature Ranking Machine Learning Training Random Search Accuracy, Precision, Recall, and f1 calculations ROC Curve
Twitter:
Linkedin:
Email: