The lecture has six chapters:
Chapters 3 to 6 can be summarized as "Statistical ML in Action".
Each chapter will keep us busy for two weeks (3 hours + 1 hour exercises).
Fetch everything by running
git clone https://github.com/mayer79/statistical_computing_material.git
in your Git console, or by downloading everything as Zip file.
Download the large dataset "January 2022 - Yellow Taxi Trip Records" from this page or use the direct download link.
Place it in the project subfolder "taxi/".
We will work with R version >= 4.1 and RStudio.
In the first two chapters, we will need these contributed R packages:
- tidyverse
- plotly
- insuranceData
- microbenchmark
- withr
- boot
- coin
For the remaining chapters, we further need:
- h2o (large package)
- arrow
- data.table
- FNN
- duckdb
- sparklyr (large package)
- rpart.plot
- ranger
- xgboost
- lightgbm
- flashlight
- keras (large, see below)
For the last chapter, we additionally need Python with TensorFlow >= 2.11. You can install it by running the R command keras::install_keras(version = "release-cpu")
. If the following code works, you are all set. (Some red start-up messages/warnings are okay.)
library(tensorflow)
tf$constant("Hello Tensorflow!")
- James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning - with Applications in R. New York: Springer.
- Hastie, T., Tibshirani, R., Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.
- Wickham, H., Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.
- Chollet, F., Allaire, J. J. (2018). Deep Learning with R. Manning Publications Co.
- Hastie Big Data 45': https://www.youtube.com/watch?v=0EWJZIC4JxA
This lecture is being distributed under the creative commons license.
Michael Mayer (2023), Statistical Computing, lecture notes, Institute of Mathematical Statistics and Actuarial Science, University of Bern. URL: https://github.com/mayer79/statistical_computing_material