diff --git a/README.md b/README.md
index 99a0bef8..367ddc1b 100644
--- a/README.md
+++ b/README.md
@@ -8,15 +8,13 @@ Install the requirements under `tf/requirements.txt`. And call `./init.sh` to co
 
 ## Data preparation
 
-In order to start a training session you first need to download trainingdata from http://lczero.org/training_data. This data is packed in tar.gz balls each containing 10'000 games or chunks as we call them. Preparing data requires the following steps:
+In order to start a training session you first need to download training data from https://storage.lczero.org/files/training_data/. Several chunks/games are packed into a tar file, and each tar file contains an hour worth of chunks. Preparing data requires the following steps:
 
 ```
-tar -xzf games11160000.tar.gz
-ls training.* | parallel gzip {}
+wget https://storage.lczero.org/files/training_data/training-run1--20200711-2017.tar
+tar -xzf training-run1--20200711-2017.tar
 ```
 
-This repacks each chunk into a gzipped file ready to be parsed by the training pipeline. Note that the `parallel` command uses all your cores and can be installed with `apt-get install parallel`.
-
 ## Training pipeline
 
 Now that the data is in the right format one can configure a training pipeline. This configuration is achieved through a yaml file, see `training/tf/configs/example.yaml`:
@@ -27,7 +25,7 @@ Now that the data is in the right format one can configure a training pipeline.
 name: 'kb1-64x6'                       # ideally no spaces
 gpu: 0                                 # gpu id to process on
 
-dataset: 
+dataset:
   num_chunks: 100000                   # newest nof chunks to parse
   train_ratio: 0.90                    # trainingset ratio
   # For separated test and train data.