Skip to content

planemanner/ELECTRA

Repository files navigation

Info

  • For computational efficiency, I aim to train BERT-SMALL(described by ELECTRA, ICLR 2020) and I will use ELECTRA FRAMEWORK rather than vanilla BERT MLM task for pretraining
  • I referred to richarddwang's repository for implementation.
  • Ultimately, I will point out which are good hyperparameters for BERT's series at least BERT-SMALL case.
  • In brief, this repository aims to do the implementation of ELECTRA.

About ELECTRA

  • As described in BERT paper, BERT should be trained in two steps.
  • First, pretrain a BERT model with two tasks denoted by masked language model and next sentence prediction.
  • Second, fine-tune the pretrained BERT model for each task.
    • Since I just want to check the benchmark score for some of my other task, in this repository, I only provide the related code base and information.
  • As pointed out in ELECTRA, ICLR 2020, It is inefficient way of pretraining a BERT model.
  • n this repository, pretraining process follows ELECTRA framework due to efficiency and performance.

Requirements

  • pytorch 1.7+, numpy, python 3.7, tqdm, transformers

Features

  • DDP-based pre-training
  • Simple word generation demo
  • Fine-tuning for downstream tasks (e.g., GLUE Benchmark)

Dataset

  • Pretraining : English wikipedia, Bookscorpus
  • Before training, you should download above two datasets and convert those a one txt format dataset. The converted dataset must be aligned sentence by sentence by using \n. For example,
    • I love you so much. \n
    • The pig walks awy from this farm. \n
    • ...
    • Tesla stock is going to be 2,000 dollars \n

Usage

  • For pretraining If you want to train a model with DDP, then

    • CUDA_VISIBLE_DEVICES={device ids} python Pretraining.py --multiprocessing_distributed 
      

    If you want to train a model with a sinlge GPU

    • CUDA_VISIBLE_DEVICES=0 python Pretraining.py
      
  • For fine-tuning

    • will be updated
  • For LM_DEMO

    • When you want to test your model whether it is working well or not, use LM_DEMO.py.
      • First, make a txt file consisted of sentences that you want to change.
      • Second, give the path of weight file for generator as an argument when you run the LM_DEMO.py script.
      • Run the LM_DEMO.py script as follows.
        python LM_DEMO.py

Curruent Status

  • DDP-based training is available

To do

  • Benchmark metric correction

Miscellaneous

  • Though the ELECTRA paper's author described that they didn't back-propagate the discriminator loss through the generator due to sampling step, actually, we can back-propagate by using gumbel softmax. But, according to richarddwang's repository, I remove the gradient graph for the sampling parts. (I used gumbel softmax provided from pytorch with minor modification due to a bug)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages