Skip to content

rongpenl/HandsOnStatisticsForDataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Hands-On Statistics Fo Data Science

The official repo for hands-on statistics for data science

  1. Describe and pre-process data with statistics in mind
    1. Chapter 1. Fundamentals of data collections, cleaning and preprocessing
      1. Collecting data from various data source
      2. Data imputation, pros and cons,
      3. Outlier removal
      4. Data standardization, when and how
      5. Examples with scikit-learn preprocessing module
    2. Chapter 2. Esential statistics for data assessment
      1. Classification of variable types: numerical and categorical
      2. Numerical variable: mean, median and mode
      3. Numerical varaible: variance, standard deviation, percentiels and skewness
      4. Categorical variables and mixed data types
      5. Bivariate and multivariate descriptive statistics
    3. Chapter 3. Visualization with statistical graphs
      1. Basic examples with Python matplotlib package
      2. Advanced visualization customization
      3. Query-oriented statistical plotting
      4. Presentation-ready plotting tips
  2. Probability, hypothesis test and the good old stuff
    1. Chapter 4. Sampling and inferential statistics
      1. Population, sample and other key concepts
      2. Sampling done right
      3. Sampling distribution of statistics and relevant techniques
    2. Chapter 5. Common probability distributions
      1. The family of discrete probability distribution
      2. The family of continuous probability distribution and CLT
      3. Joint distribution and conditional distribution
      4. The power law and black swan
    3. Chapter 6. Parametric estimation
      1. Overview of parametric estimation
      2. Properties of an estimator
      3. Maximum likelihood with examples
    4. Chapter 7. Statistical hypotheis test
      1. Hypothesis test overview
      2. Confidence intervals and p-value
      3. Hypothesis test with statsmodels package
      4. The ANOVA model
      5. Statistical test for time series models
      6. A/B testing with examples
  3. Statistics in machine learning
    1. Chapter 8. Statistics for regression tasks
      1. Simple linear regression
      2. Linear regression and estimator
      3. Multivariate linear regression and collinearity analysis
      4. Logistic regression and regularization
      5. Miscellaneous topics in regression
    2. Chapter 9. Statistics for classification tasks
      1. Classification tasks overview
      2. Naive Bayesian classifier from scratch
      3. Support vector classifier
      4. Introduction to cross-validation
    3. Chapter 10. Statistical techniques for tree-based methods
      1. Intuition and advantages of tree-based methods
      2. Ingredients of a classification tree with code
      3. Statistics of tree-based methods with scikit-learn
    4. Chapter 11. Implementing statistics for ensemble learning
      1. Understanding Random forests
      2. The technique of Bagging
      3. Boosting
  4. Appendix
    1. Best practice collections
      1. Garbage in, garbage out
      2. How graphs mislead readers
      3. How Causal arguments derail
    2. Exercises, projects and further reading
      1. Exercieses with selected answers
      2. Project suggestions for each chapter
      3. Further reading

About

The official repo for hands-on statistics for data science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published