Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README: Update training data URL #135

Merged
merged 3 commits into from
Oct 10, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ Install the requirements under `tf/requirements.txt`. And call `./init.sh` to co

## Data preparation

In order to start a training session you first need to download trainingdata from http://lczero.org/training_data. This data is packed in tar.gz balls each containing 10'000 games or chunks as we call them. Preparing data requires the following steps:
In order to start a training session you first need to download trainingdata from https://storage.lczero.org/files/training_data/. This data is packed in tar.gz balls each containing 10'000 games or chunks as we call them. Preparing data requires the following steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is out of date as well. We have tars of compressed training files that are one hours worth of training data each.
Since they are already compressed inside the tar the instructions below are also wrong, there is no need to run parallel gzip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, has just updated the instruction.


```
tar -xzf games11160000.tar.gz
wget https://storage.lczero.org/files/training_data/training-run1--20200711-2017.tar
tar -xzf training-run1--20200711-2017.tar
ls training.* | parallel gzip {}
```

Expand All @@ -27,7 +28,7 @@ Now that the data is in the right format one can configure a training pipeline.
name: 'kb1-64x6' # ideally no spaces
gpu: 0 # gpu id to process on

dataset:
dataset:
num_chunks: 100000 # newest nof chunks to parse
train_ratio: 0.90 # trainingset ratio
# For separated test and train data.
Expand Down