The official repo for hands-on statistics for data science
- Describe and pre-process data with statistics in mind
- Chapter 1. Fundamentals of data collections, cleaning and preprocessing
- Collecting data from various data source
- Data imputation, pros and cons,
- Outlier removal
- Data standardization, when and how
- Examples with scikit-learn preprocessing module
- Chapter 2. Esential statistics for data assessment
- Classification of variable types: numerical and categorical
- Numerical variable: mean, median and mode
- Numerical varaible: variance, standard deviation, percentiels and skewness
- Categorical variables and mixed data types
- Bivariate and multivariate descriptive statistics
- Chapter 3. Visualization with statistical graphs
- Basic examples with Python matplotlib package
- Advanced visualization customization
- Query-oriented statistical plotting
- Presentation-ready plotting tips
- Chapter 1. Fundamentals of data collections, cleaning and preprocessing
- Probability, hypothesis test and the good old stuff
- Chapter 4. Sampling and inferential statistics
- Population, sample and other key concepts
- Sampling done right
- Sampling distribution of statistics and relevant techniques
- Chapter 5. Common probability distributions
- The family of discrete probability distribution
- The family of continuous probability distribution and CLT
- Joint distribution and conditional distribution
- The power law and black swan
- Chapter 6. Parametric estimation
- Overview of parametric estimation
- Properties of an estimator
- Maximum likelihood with examples
- Chapter 7. Statistical hypotheis test
- Hypothesis test overview
- Confidence intervals and p-value
- Hypothesis test with statsmodels package
- The ANOVA model
- Statistical test for time series models
- A/B testing with examples
- Chapter 4. Sampling and inferential statistics
- Statistics in machine learning
- Chapter 8. Statistics for regression tasks
- Simple linear regression
- Linear regression and estimator
- Multivariate linear regression and collinearity analysis
- Logistic regression and regularization
- Miscellaneous topics in regression
- Chapter 9. Statistics for classification tasks
- Classification tasks overview
- Naive Bayesian classifier from scratch
- Support vector classifier
- Introduction to cross-validation
- Chapter 10. Statistical techniques for tree-based methods
- Intuition and advantages of tree-based methods
- Ingredients of a classification tree with code
- Statistics of tree-based methods with scikit-learn
- Chapter 11. Implementing statistics for ensemble learning
- Understanding Random forests
- The technique of Bagging
- Boosting
- Chapter 8. Statistics for regression tasks
- Appendix
- Best practice collections
- Garbage in, garbage out
- How graphs mislead readers
- How Causal arguments derail
- Exercises, projects and further reading
- Exercieses with selected answers
- Project suggestions for each chapter
- Further reading
- Best practice collections