Hands-On Statistics Fo Data Science

The official repo for hands-on statistics for data science

Describe and pre-process data with statistics in mind
1. Chapter 1. Fundamentals of data collections, cleaning and preprocessing
  1. Collecting data from various data source
  2. Data imputation, pros and cons,
  3. Outlier removal
  4. Data standardization, when and how
  5. Examples with scikit-learn preprocessing module
2. Chapter 2. Esential statistics for data assessment
  1. Classification of variable types: numerical and categorical
  2. Numerical variable: mean, median and mode
  3. Numerical varaible: variance, standard deviation, percentiels and skewness
  4. Categorical variables and mixed data types
  5. Bivariate and multivariate descriptive statistics
3. Chapter 3. Visualization with statistical graphs
  1. Basic examples with Python matplotlib package
  2. Advanced visualization customization
  3. Query-oriented statistical plotting
  4. Presentation-ready plotting tips
Probability, hypothesis test and the good old stuff
1. Chapter 4. Sampling and inferential statistics
  1. Population, sample and other key concepts
  2. Sampling done right
  3. Sampling distribution of statistics and relevant techniques
2. Chapter 5. Common probability distributions
  1. The family of discrete probability distribution
  2. The family of continuous probability distribution and CLT
  3. Joint distribution and conditional distribution
  4. The power law and black swan
3. Chapter 6. Parametric estimation
  1. Overview of parametric estimation
  2. Properties of an estimator
  3. Maximum likelihood with examples
4. Chapter 7. Statistical hypotheis test
  1. Hypothesis test overview
  2. Confidence intervals and p-value
  3. Hypothesis test with statsmodels package
  4. The ANOVA model
  5. Statistical test for time series models
  6. A/B testing with examples
Statistics in machine learning
1. Chapter 8. Statistics for regression tasks
  1. Simple linear regression
  2. Linear regression and estimator
  3. Multivariate linear regression and collinearity analysis
  4. Logistic regression and regularization
  5. Miscellaneous topics in regression
2. Chapter 9. Statistics for classification tasks
  1. Classification tasks overview
  2. Naive Bayesian classifier from scratch
  3. Support vector classifier
  4. Introduction to cross-validation
3. Chapter 10. Statistical techniques for tree-based methods
  1. Intuition and advantages of tree-based methods
  2. Ingredients of a classification tree with code
  3. Statistics of tree-based methods with scikit-learn
4. Chapter 11. Implementing statistics for ensemble learning
  1. Understanding Random forests
  2. The technique of Bagging
  3. Boosting
Appendix
1. Best practice collections
  1. Garbage in, garbage out
  2. How graphs mislead readers
  3. How Causal arguments derail
2. Exercises, projects and further reading
  1. Exercieses with selected answers
  2. Project suggestions for each chapter
  3. Further reading

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hands-On Statistics Fo Data Science

About

Releases

Packages

rongpenl/HandsOnStatisticsForDataScience

Folders and files

Latest commit

History

Repository files navigation

Hands-On Statistics Fo Data Science

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages