On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Languages Models of Code
This is the official replication package associated with the FSE 23' submission:
On the Usage of Continual Learning for Out-of-Distribution Generalization" in Pre-trained Languages Models of Code
.
In this readme, we provide details on how to setup our codebase for experimenting with continual fine-tuning. We include links to download our datasets and models locally. Unfortunately, we cannot leverage HuggingFace's hub as it does not allow double-blind sharing of repositories.
- Clone this repository using
git
. - Setup a
Python 3
virtual environment and install the requirements.
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
Note that Pytorch's version may not be suitable for your environment. Make sure the installed version matches your CUDA
version.
All the data used in our experiments can be downloaded here: https://zenodo.org/record/7600769#.Y9y_FXbML-h. The data where downloaded from GitHub using Google BigQuery. We do not share the 68M Java methods (full dataset extracted from GitHub), but intend to publish the data later.
Here is a description of each dataset:
gh-java-methods-small-id-train
: in-distribution pre-training data (~9M java methods).gh-java-methods-small-id-valid
: in-distribution validation data.gh-java-methods-small-id-test
: in-distribution test datagh-java-methods-small-ood-train
: out-of-distribution fine-tuning data.gh-java-methods-small-ood-test
: out-of-distribution test data.
The datasets are compressed using parquet format. You do not need to download the training/validation data if you do not intend to pre-training or fine-tuning models.
We recommend storing the dataset folders inside a data
folder in the folder of your local clone of this repository.
We provide all models checkpoints used in our experiments. code-gpt-small
refers to the decoder based on GPT2 architecture, and code-roberta-base
refers to the encoder based on RoBERTa architecture.
The models can be downloaded from the same link as the data: https://zenodo.org/record/7600769#.Y9y_FXbML-h.
The hyperparameters and details about the models' architectures can be found in the following configuration files:
./configuration/model/code-gpt2-small_config.json
for the decoder../configuration/model/code-roberta-base_config.json
for the encoder.
Here is a description of each model:
code-gpt-small
: checkpoint of the decoder after pre-training.code-roberta-base
: checkpoint of the encoder after pre-training.code-gpt-small_ft_$method$
: checkpoint of the decoder after fine-tuning using a specific method (i.e., rwalk, ewc, naive)code-roberta-base_$method$
: checkpoint of the decoder after fine-tuning using a specific method (i.e., rwalk, ewc, naive)
Inside each fine-tuning checkpoint, you will find five sub-folders. Each sub-folder contains the checkpoint of the model after a specific fine-tuning step.
For the paper, we fine-tuned the models sequentially on the OOD datasets following the order: General → Security → Android → Web → Guava. Therefore, the folder exp_0
contains the model's checkpoint after fine-tuning the pre-trained model on the General
OOD dataset. The folder exp_1
contains the model's checkpoint after fine-tuning the model from checkpoint exp_0
on the Security
OOD dataset, etc.
We recommend storing the dataset folders inside a models
folder in the folder of your local clone of this repository.
We use Hydra to run all our codes using configuration files. The configuration files can be found in the folder configuration
. You can choose to change the .yaml files manually or to change variables directly in the shell.
Additionally, for pre-training and fine-tuning we use Wandb. We left the usage of Wandb optional.
You do not need to pre-train or fine-tune any model as we provide all the checkpoints. Nonetheless, we provide explanation on how to run a pre-training and a fine-tuning for sake of completeness.
We leverage Accelerate and deepspeed to pre-train our models. We provide the configuration we used in the file accelerator_deepspeed.yaml
.
To pre-train a GPT2-like or a RoBERTa-like model, run the following command:
accelerate launch --config_file accelerator_deepspeed.yaml run_pretraining.py \
model=(code-roberta-base | code-gpt2-small) \
run=pretraining
use_wandb=(true | false) \
hydra=output_pretraining
You can change hyperparameters of the model by editing variables from the ./configuration/run/pretraining.yaml
file.
For instance, to set the weight_decay to 0.01, set run.weight_decay=0.01
.
You can fine-tune a model using the following command:
python run_finetuning.py \
run=finetuning \
run.strategy=(naive | cumulative | ewc | rwalk | si | replay) \
run.train_batch_size=8 \
run.valid_batch_size=8 \
run.learning_rate= 5e-5 \
run.num_epochs_per_experience=10 \
run.patience=2 \
model.model_name_or_path=./models/code-gpt2-small \
model.model_base_path=./models/code-gpt2-small \
hydra=output_finetuning
This will output log and model files in a new directory:
./models/code-gpt2-small/ft_$method$/
where
You can change hyperparameters of each fine-tuning strategy in the ./configuration/run/finetuning.yaml
configuration file.
To run zero-shot inference on the in-distribution dataset, run the following command:
python run_inference.py \
run=inference \
run.dataset_name=./data/gh-java-methods-small-id-ft-test \
run.domain=all \
run.task=(call | usage) \
model.model_name_or_path=./models/code-gpt2-small
hydra=output_inference
This will test code-gpt-small
in zero-shot on the ID data and all domains (by concatenating the test set). You do separate zero-shot testing by setting the variable run.domain
to a specific domain (e.g., general, security, ...) as specified in the ./configuration/run/inference.yaml
configuration file.
To run inference on a fine-tuned model, run the following command:
python run_inference.py \
run=inference \
run.dataset_name=./data/gh-java-methods-small-ood-test \
run.domain=(all | general | security | android | web | guava) \
run.task=(call | usage) \
model.model_name_or_path=./models/code-gpt2-small_ft_$method$/exp_$id$
hydra=output_inference
Note that fine-tuning produces five checkpoints, i.e., one after each fine-tuning step. Therefore, you need to specify which checkpoint you want to test, e.g., exp_0
, exp_1
, etc.