Skip to content

Fundamentals of data science for Data Science Retreat.

Notifications You must be signed in to change notification settings

antahiap/DSR_DS_Fundamental

 
 

Repository files navigation

Data Science

Credit to Adam Green - adgefficiency.com

  • research = worry about models
  • actual applications = worry about data
  • this course is about the data

Course notes

visualization.ipynb

  • matplotlib APIs
  • t-SNE & PCA

data-cleaning.ipynb

  • cleaning columns in place
  • work that has to be done

feature-engineering.ipynb

  • adding new columns
  • optional work that improves performance

linear-models.ipynb

[imbalanced-classes.ipynb]

model-selection.ipynb

  • what models to try

model-evaluation.ipynb

  • how to select a model
  • hyperparameter tuning

interpretation.ipynb

  • linear model coefficients
  • LIME

feature-selection.ipynb

  • univariate
  • stability selection

Where to find data

NIMH Data Archive - National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains

StatLib Datasets Archive

Kaggle Datasets

Internet archive - non-profit library of millions of free books, movies, software, music, websites, and more

r/datasets

Programmable Web - API's

UCI Machine Learning Repository

Common crawl - an open repository of web crawl data that can be accessed and analyzed by anyone

List of datasets for machine-learning research - Wikipedia

About

Fundamentals of data science for Data Science Retreat.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.7%
  • Python 0.3%