From e191fd17d787448220418d5aa7f3e48f48f3db83 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?S=C3=A9bastien=20Rombauts?= <sebastien.rombauts@gmail.com>
Date: Tue, 18 Apr 2017 22:54:03 +0200
Subject: [PATCH] README: add info and useful tips from @ubergarm from issue
 #91

see https://github.com/sherjilozair/char-rnn-tensorflow/issues/91#issuecomment-286872803
---
 README.md | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index e881aab1..92f30d93 100644
--- a/README.md
+++ b/README.md
@@ -16,6 +16,11 @@ Inspired from Andrej Karpathy's [char-rnn](https://github.com/karpathy/char-rnn)
 To train with default parameters on the tinyshakespeare corpus, run `python train.py`. To access all the parameters use `python train.py --help`.
 
 To sample from a checkpointed model, `python sample.py`.
+Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU.
+To force CPU mode, use `export CUDA_VISIBLE_DEVICES=""` and `unset CUDA_VISIBLE_DEVICES` afterward
+(resp. `set CUDA_VISIBLE_DEVICES=""` and `set CUDA_VISIBLE_DEVICES=` on Windows).
+
+To continue training after interruption or to run on more epochs, `python train.py --init_from=save`
 
 ## Datasets
 You can use any plain text file as input. For example you could download [The complete Sherlock Holmes](https://sherlock-holm.es/ascii/) as such:
@@ -30,7 +35,22 @@ mv cnus.txt input.txt
 
 Then start train from the top level directory using `python train.py --data_dir=./data/sherlock/`
 
-A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`
+A quick tip to concatenate many small disparate `.txt` files into one large training file: `ls *.txt | xargs -L 1 cat >> input.txt`.
+
+## Tuning
+
+Tuning your models is kind of a "dark art" at this point. In general:
+
+1. Start with as much clean input.txt as possible e.g. 50MiB
+2. Start by establishing a baseline using the default settings.
+3. Use tensorboard to compare all of your runs visually to aid in experimenting.
+4. Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
+5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
+6. Tweak --seq_length up from 50 based on the length of a valid input string
+   (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc).
+   An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
+7. Finally once you've done all that, only then would I suggest adding some dropout.
+   Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.
 
 ## Tensorboard
 To visualize training progress, model graphs, and internal state histograms:  fire up Tensorboard and point it at your `log_dir`.  E.g.: