diff --git a/README.md b/README.md index 99a0bef8..367ddc1b 100644 --- a/README.md +++ b/README.md @@ -8,15 +8,13 @@ Install the requirements under `tf/requirements.txt`. And call `./init.sh` to co ## Data preparation -In order to start a training session you first need to download trainingdata from http://lczero.org/training_data. This data is packed in tar.gz balls each containing 10'000 games or chunks as we call them. Preparing data requires the following steps: +In order to start a training session you first need to download training data from https://storage.lczero.org/files/training_data/. Several chunks/games are packed into a tar file, and each tar file contains an hour worth of chunks. Preparing data requires the following steps: ``` -tar -xzf games11160000.tar.gz -ls training.* | parallel gzip {} +wget https://storage.lczero.org/files/training_data/training-run1--20200711-2017.tar +tar -xzf training-run1--20200711-2017.tar ``` -This repacks each chunk into a gzipped file ready to be parsed by the training pipeline. Note that the `parallel` command uses all your cores and can be installed with `apt-get install parallel`. - ## Training pipeline Now that the data is in the right format one can configure a training pipeline. This configuration is achieved through a yaml file, see `training/tf/configs/example.yaml`: @@ -27,7 +25,7 @@ Now that the data is in the right format one can configure a training pipeline. name: 'kb1-64x6' # ideally no spaces gpu: 0 # gpu id to process on -dataset: +dataset: num_chunks: 100000 # newest nof chunks to parse train_ratio: 0.90 # trainingset ratio # For separated test and train data.