[BERT] update README (pytorch#391)

rajveerb · Jun 19, 2020 · 67c2ebe · 67c2ebe
1 parent 80fef7a
commit 67c2ebe
Showing 1 changed file with 25 additions and 13 deletions.
diff --git a/language_model/tensorflow/bert/README.md b/language_model/tensorflow/bert/README.md
@@ -22,6 +22,8 @@ The following command will run the clean up steps, and put the results in ./resu
 After running the process_wiki.sh script, for the 20200101 wiki dump, there will be 500 files, named part-00xxx-of-00500 in the ./results directory.
 
 Exact steps (starting in the bert path)  
+
+```shell
 cd cleanup_scripts  
 mkdir -p wiki  
 cd wiki  
@@ -32,6 +34,7 @@ git clone https://github.com/attardi/wikiextractor.git
 python3 wikiextractor/WikiExtractor.py wiki/enwiki-20200101-pages-articles-multistream.xml    # Results are placed in bert/cleanup_scripts/text  
 ./process_wiki.sh '<text/*/wiki_??'  
 python3 extract_test_set_articles.py  
+```
 
 MD5sums:  
 7f59165e21b7d566db610ff6756c926b - bert_config.json  
@@ -59,12 +62,13 @@ python3 create_pretraining_data.py \
 ```
 
 The generated tfrecord has 500 parts, totalling to ~365GB.
+The dataset was generated using Python 3.7.6 and tensorflow-gpu 1.15.2.
 
 # Stopping criteria
-The training should occur over a minimum of 9,600,000 samples (e.g., 150k steps x 64 global batch size). A valid submission will evaluate a masked lm accuracy >= 0.678. The requirement for both minimum step count and accuracy is due to the inherent fluctuations in accuracy wrt pretraining steps, which has a standard deviation of step count measured at ~10% to meet the targer mlm_accuracy of 0.678 on a sample of 10 training runs. Based on this data, it is expected that ~95% of runs will be >= 0.678 mlm_accuracy after training 9.6M samples using the reference code and data.  
-  
-The evaluation will be on the first 500 consecutive samples of the withheld test set. Slightly more data than was needed was withheld in the test set for experimentation to determine the stopping criteria. 500 samples was chosen to minimize evaluation cost while maintaining stability of the results.  
-  
+The training should occur over a minimum of 3,000,000 samples. A valid submission will evaluate a masked lm accuracy >= 0.712. 
+
+The evaluation will be on the first 10,000 consecutive samples of the training set. The evalution frequency is every 500,000 samples, starting from 3,000,000 samples. The evaluation can be either offline or online for v0.7. More details please refer to the training policy.
+
 The generation of the evaluation set shard should follow the exact command shown above, using create_pretraining_data.py. In particular the seed (12345) must be set to ensure everyone evaluates on the same data.
 
 # Running the model
@@ -73,6 +77,7 @@ To run this model, use the following command.
 
 ```shell
 
+TF_XLA_FLAGS='--tf_xla_auto_jit=2' \
 python run_pretraining.py \
   --bert_config_file=./bert_config.json \
   --output_dir=/tmp/output/ \
@@ -85,20 +90,21 @@ python run_pretraining.py \
   --iterations_per_loop=1000 \
   --max_predictions_per_seq=76 \
   --max_seq_length=512 \
-  --num_train_steps=150000 \
-  --num_warmup_steps=0 \
+  --num_train_steps=682666666 \
+  --num_warmup_steps=1562 \
   --optimizer=lamb \
-  --save_checkpoints_steps=1000 \
+  --save_checkpoints_steps=20833 \
   --start_warmup_step=0 \
-   --num_gpus=16 \
-  --train_batch_size=4/
+  --num_gpus=8 \
+  --train_batch_size=24
 
 ```
 
 The above parameters are for a machine with 8 V100 GPUs with 16GB memory each; the hyper parameters (learning rate, warm up steps, etc.) are for testing only. The training script won’t print out the masked_lm_accuracy; in order to get masked_lm_accuracy, a separately invocation of run_pretraining.py with the following command with a V100 GPU with 16 GB memory:
 
 ```shell
 
+TF_XLA_FLAGS='--tf_xla_auto_jit=2' \
 python3 run_pretraining.py \
   --bert_config_file=./bert_config.json \
   --output_dir=/tmp/output/ \
@@ -113,13 +119,19 @@ python3 run_pretraining.py \
   --max_predictions_per_seq=76 \
   --max_seq_length=512 \
   --num_gpus=1 \
-  --num_train_steps=2400000 \
-  --num_warmup_steps=3125 \
+  --num_train_steps=682666666 \
+  --num_warmup_steps=1562 \
   --optimizer=lamb \
-  --save_checkpoints_steps=1000 \
+  --save_checkpoints_steps=20833 \
   --start_warmup_step=0 \
-  --train_batch_size=4 \
+  --train_batch_size=24 \
   --nouse_tpu
 
 ```
 
+The model has been tested using the following stack:
+- Debian GNU/Linux 9.12 GNU/Linux 4.9.0-12-amd64 x86_64
+- NVIDIA Driver 440.64.00
+- NVIDIA Docker 2.2.2-1 + Docker 19.03.8
+- docker image tensorflow/tensorflow:2.2.0rc0-gpu-py3
+