Info

For computational efficiency, I aim to train BERT-SMALL(described by ELECTRA, ICLR 2020) and I will use ELECTRA FRAMEWORK rather than vanilla BERT MLM task for pretraining
I referred to richarddwang's repository for implementation.
Ultimately, I will point out which are good hyperparameters for BERT's series at least BERT-SMALL case.
In brief, this repository aims to do the implementation of ELECTRA.

About ELECTRA

As described in BERT paper, BERT should be trained in two steps.
First, pretrain a BERT model with two tasks denoted by masked language model and next sentence prediction.
Second, fine-tune the pretrained BERT model for each task.
- Since I just want to check the benchmark score for some of my other task, in this repository, I only provide the related code base and information.
As pointed out in ELECTRA, ICLR 2020, It is inefficient way of pretraining a BERT model.
n this repository, pretraining process follows ELECTRA framework due to efficiency and performance.

Requirements

pytorch 1.7+, numpy, python 3.7, tqdm, transformers

Features

DDP-based pre-training
Simple word generation demo
Fine-tuning for downstream tasks (e.g., GLUE Benchmark)

Dataset

Pretraining : English wikipedia, Bookscorpus
Before training, you should download above two datasets and convert those a one txt format dataset. The converted dataset must be aligned sentence by sentence by using \n. For example,
- I love you so much. \n
- The pig walks awy from this farm. \n
- ...
- Tesla stock is going to be 2,000 dollars \n

Usage

For pretraining If you want to train a model with DDP, then

CUDA_VISIBLE_DEVICES={device ids} python Pretraining.py --multiprocessing_distributed

If you want to train a model with a sinlge GPU

CUDA_VISIBLE_DEVICES=0 python Pretraining.py

For fine-tuning
- will be updated
For LM_DEMO
- When you want to test your model whether it is working well or not, use LM_DEMO.py.
  - First, make a txt file consisted of sentences that you want to change.
  - Second, give the path of weight file for generator as an argument when you run the LM_DEMO.py script.
  - Run the LM_DEMO.py script as follows.
```
python LM_DEMO.py
```

Curruent Status

DDP-based training is available

To do

Benchmark metric correction

Miscellaneous

Though the ELECTRA paper's author described that they didn't back-propagate the discriminator loss through the generator due to sampling step, actually, we can back-propagate by using gumbel softmax. But, according to richarddwang's repository, I remove the gradient graph for the sampling parts. (I used gumbel softmax provided from pytorch with minor modification due to a bug)

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.idea		.idea
Dummies		Dummies
Models		Models
Pruning		Pruning
__pycache__		__pycache__
data_related		data_related
.DS_Store		.DS_Store
BERT_STRUCTURE.png		BERT_STRUCTURE.png
Finetuning.py		Finetuning.py
LM_DEMO.py		LM_DEMO.py
Optuna_Exp.py		Optuna_Exp.py
Optuna_Exp_For_Prune.py		Optuna_Exp_For_Prune.py
Pretraining.py		Pretraining.py
Pruning_main.py		Pruning_main.py
README.md		README.md
dummy.py		dummy.py
msr_paraphrase_train.txt		msr_paraphrase_train.txt
train.tsv		train.tsv
utils.py		utils.py
vocab.txt		vocab.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Info

About ELECTRA

Requirements

Features

Dataset

Usage

Curruent Status

To do

Miscellaneous

About

Releases

Packages

Languages

planemanner/ELECTRA

Folders and files

Latest commit

History

Repository files navigation

Info

About ELECTRA

Requirements

Features

Dataset

Usage

Curruent Status

To do

Miscellaneous

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages