From e191fd17d787448220418d5aa7f3e48f48f3db83 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Rombauts?= Date: Tue, 18 Apr 2017 22:54:03 +0200 Subject: [PATCH] README: add info and useful tips from @ubergarm from issue #91 see https://github.com/sherjilozair/char-rnn-tensorflow/issues/91#issuecomment-286872803 --- README.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e881aab1..92f30d93 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,11 @@ Inspired from Andrej Karpathy's [char-rnn](https://github.com/karpathy/char-rnn) To train with default parameters on the tinyshakespeare corpus, run `python train.py`. To access all the parameters use `python train.py --help`. To sample from a checkpointed model, `python sample.py`. +Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU. +To force CPU mode, use `export CUDA_VISIBLE_DEVICES=""` and `unset CUDA_VISIBLE_DEVICES` afterward +(resp. `set CUDA_VISIBLE_DEVICES=""` and `set CUDA_VISIBLE_DEVICES=` on Windows). + +To continue training after interruption or to run on more epochs, `python train.py --init_from=save` ## Datasets You can use any plain text file as input. For example you could download [The complete Sherlock Holmes](https://sherlock-holm.es/ascii/) as such: @@ -30,7 +35,22 @@ mv cnus.txt input.txt Then start train from the top level directory using `python train.py --data_dir=./data/sherlock/` -A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt` +A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`. + +## Tuning + +Tuning your models is kind of a "dark art" at this point. In general: + +1. Start with as much clean input.txt as possible e.g. 50MiB +2. Start by establishing a baseline using the default settings. +3. Use tensorboard to compare all of your runs visually to aid in experimenting. +4. Tweak --rnn_size up somewhat from 128 if you have a lot of input data. +5. Tweak --num_layers from 2 to 3 but no higher unless you have experience. +6. Tweak --seq_length up from 50 based on the length of a valid input string + (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). + An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances. +7. Finally once you've done all that, only then would I suggest adding some dropout. + Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values. ## Tensorboard To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your `log_dir`. E.g.: