1 Geographic Data Science Lab, University of Liverpool, Liverpool, United Kingdom
* Correspondence: [email protected]
This site introduces the course Introduction to Statistical Learning in R. The course provides an introduction to statistics and probability covering essential topics in descriptive and inferential statistics and supervised machine learning. It adopts a problem-to-solution teaching approach, defining a practical problem and illustrating how statistics can enable understanding to make critically informed decisions about a population by examining a random sample. It uses a learning-by-doing approach based on real-world examples in various contexts. This also teaches how to conduct statistical data analysis in R. The course is organised around 6 sessions. Each session is designed to provide a combination of key statistical concepts and practical application through the use of R.
The course comprises three main components. The first component focuses on descriptive statistics, including descriptive statistics of different data types, common probability distributions and measures of centrality and dispersion. The second component involves inferential statistics covering hypothesis testing, confidence intervals, correlation, regression analysis, supervised machine learning approaches and cross-validation.
Having successfully completed this course, you will be able to:
- Conduct exploratory statistical data analysis.
- Have an understanding of elementary probability distributions and data types.
- Perform correlation and regression data analysis using real-world data.
- Assess the statistical significance between different data types.
- Carry out statistical data analysis in R.
- Have a basic understanding of supervised machine learning and cross-validation.
The notes for each session are:
-
Session 1 Introduction to R: Data types & probability distributions
-
Session 2 Descriptive Statistics: Measures of centrality & dispersion for continuous & categorical data
-
Session 3 Statistical Significance: Hypothesis testing & confidence intervals
-
Session 4 Correlation: Correlation visualisation & measures
-
Session 5 Regression Analysis: Linear regression, dummy variables & logistic regression
-
Session 6 Supervised Machine Learning: Tree Regressions, Random Forest & Cross-validation
If you use the material, code or processed data, you can give appropriate attribution by using the following citation:
@article{rowe_slr20,
author = {Francisco Rowe},
title = {Introduction to Statistical Learning in R},
year = 2020,
url = {\url{https://fcorowe.github.io/sl/}},
doi = {10.5281/zenodo.4007043},
}