Credit to Adam Green - adgefficiency.com
- research = worry about models
- actual applications = worry about data
- this course is about the data
- matplotlib APIs
- t-SNE & PCA
- cleaning columns in place
- work that has to be done
- adding new columns
- optional work that improves performance
[imbalanced-classes.ipynb]
- what models to try
- how to select a model
- hyperparameter tuning
- linear model coefficients
- LIME
- univariate
- stability selection
NIMH Data Archive - National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains
Internet archive - non-profit library of millions of free books, movies, software, music, websites, and more
Programmable Web - API's
UCI Machine Learning Repository
Common crawl - an open repository of web crawl data that can be accessed and analyzed by anyone