The lecture has six chapters:
Chapters 3 to 6 can be summarized as "Statistical ML in Action".
Each chapter will keep us busy for two weeks (3 hours + 1 hour exercises).
Fetch everything by running
git clone
in your Git console, or by downloading everything as Zip file.
Download the large dataset "January 2018 - Yellow Taxi Trip Records" from this page.
Place it in the project subfolder "taxi/".
We will work with R version >= 4.4 and RStudio.
In the first two chapters, we will need these contributed R packages:
- tidyverse
- plotly
- insuranceData
- bench
- withr
- boot
- coin
For the remaining chapters, we further need:
- h2o (requires Java)
- arrow
- data.table
- duckdb
- sparklyr (requires Java)
- rpart.plot
- ranger
- xgboost
- lightgbm
- hstats
- MetricsWeighted
- keras (requires Python, see below)
For the last chapter, we additionally need Python with TensorFlow >= 2.15. You can install it by running the R command keras::install_keras(version = "release-cpu")
. If the following code works, you are all set. (Some red start-up messages/warnings are okay.)
tf$constant("Hello Tensorflow!")
- James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning - with Applications in R. New York: Springer.
- Hastie, T., Tibshirani, R., Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.
- Wickham, H., Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.
- Chollet, F., Allaire, J. J. (2018). Deep Learning with R. Manning Publications Co.
- Hastie Big Data 45':
This lecture is being distributed under the creative commons license.
Michael Mayer (2023), Statistical Computing, lecture notes, Institute of Mathematical Statistics and Actuarial Science, University of Bern. URL: