Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train a EncoderDecoder Adapter/Prediction Head needed? #575

Open
2 of 3 tasks
julianpollmann opened this issue Aug 12, 2023 · 1 comment
Open
2 of 3 tasks
Labels
bug:encoder-decoder bug Something isn't working

Comments

@julianpollmann
Copy link

Environment info

  • adapter-transformers version: 3.2.1
  • Platform: Linux-6.2.0-27-generic-x86_64-with-glibc2.37
  • Python version: 3.10.9
  • PyTorch version (GPU?): 1.13.1 (GPU)

Details

Hey, I'm trying to train a adapter for a Seq2Seq task with language adapters. Since most of the language adapters on the hub are pretrained for BERT or RoBERTa I cannot use e.g. BART for the task adapter. I set up a EncoderDecoder Model with bert-base-mulitlingual-cased as base, but even with very few training data the training loss of adapter training stagnates at a high level (~4) and does not predict something meaningful. When fully fine-tuning with the same training settings the training loss quickly decreases around 0. Setups I tried:

  • Training a task adapter with bart-base - works
  • Full Fine-tuning an EncoderDecoder model based on bert-base-mulitlingual-cased using the Huggingface Trainer - works
  • Training a task adapter with an EncoderDecoder model based on bert-base-mulitlingual-cased - the model repeatedly predicts the same word; training loss stagnates at high level.

Base model setup

model = EncoderDecoderModel.from_encoder_decoder_pretrained(
    "bert-base-multilingual-cased",
    "bert-base-multilingual-cased",
    tie_encoder_decoder=True
)

model.config.decoder_start_token_id = tokenizer.bos_token_id
model.config.eos_token_id = tokenizer.eos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
model.config.vocab_size = model.config.decoder.vocab_size

bad_words = ['[CLS]']
bad_words_ids = [tokenizer.vocab[token] for token in bad_words]
model.config.bad_words_ids = [bad_words_ids]

Adapter setup

I tried to add a task adapter using multiple methods:

adapter_config = AdapterConfig.load("pfeiffer", reduction_factor=2)
model.add_adapter("simplification", config=adapter_config, set_active=True)
model.train_adapter(["simplification"])

or

setup_adapter_training(
    model=model,
    adapter_args=AdapterArguments(train_adapter=True),
    adapter_name="simplification",
    adapter_config_kwargs={"reduction_factor": 2}
)

When training a adapter using BART, a prediction head is added. With the EncoderDecoder this seems to be missing.The saved adapter does not contain a head_config.json like the BART trained adapter.

What do I need to change to train this task adapter with an EncoderDecoder Model?

Training setup

training_args = Seq2SeqTrainingArguments(
    predict_with_generate=True,
    evaluation_strategy="steps",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    fp16=True, 
    output_dir="./bert2bert/",
    max_steps=3000,
    logging_steps=50,
    save_strategy="no",
    # eval_steps=50,
    learning_rate=3e-4,
    warmup_ratio=0.1,
    optim="adamw_torch",
    remove_unused_columns=False,
)

trainer = Seq2SeqAdapterTrainer(
    tokenizer=tokenizer,
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['validation'],
)
train_results = trainer.train()

EncoderDecoder adapter_config.json

{
  "config": {
    "adapter_residual_before_ln": false,
    "cross_adapter": false,
    "factorized_phm_W": true,
    "factorized_phm_rule": false,
    "hypercomplex_nonlinearity": "glorot-uniform",
    "init_weights": "bert",
    "inv_adapter": null,
    "inv_adapter_reduction_factor": null,
    "is_parallel": false,
    "learn_phm": true,
    "leave_out": [],
    "ln_after": false,
    "ln_before": false,
    "mh_adapter": false,
    "non_linearity": "relu",
    "original_ln_after": true,
    "original_ln_before": true,
    "output_adapter": true,
    "phm_bias": true,
    "phm_c_init": "normal",
    "phm_dim": 4,
    "phm_init_range": 0.0001,
    "phm_layer": false,
    "phm_rank": 1,
    "reduction_factor": 2,
    "residual_before_ln": true,
    "scaling": 1.0,
    "shared_W_phm": false,
    "shared_phm_rule": true,
    "use_gating": false
  },
  "hidden_size": null,
  "model_class": "EncoderDecoderModel",
  "model_name": null,
  "model_type": "encoder-decoder",
  "name": "simplification",
  "version": "3.2.1"
}

Bart adapter_config.json:

{
  "config": {
    "adapter_residual_before_ln": false,
    "cross_adapter": false,
    "factorized_phm_W": true,
    "factorized_phm_rule": false,
    "hypercomplex_nonlinearity": "glorot-uniform",
    "init_weights": "bert",
    "inv_adapter": null,
    "inv_adapter_reduction_factor": null,
    "is_parallel": false,
    "learn_phm": true,
    "leave_out": [],
    "ln_after": false,
    "ln_before": false,
    "mh_adapter": false,
    "non_linearity": "relu",
    "original_ln_after": true,
    "original_ln_before": true,
    "output_adapter": true,
    "phm_bias": true,
    "phm_c_init": "normal",
    "phm_dim": 4,
    "phm_init_range": 0.0001,
    "phm_layer": false,
    "phm_rank": 1,
    "reduction_factor": 8,
    "residual_before_ln": true,
    "scaling": 1.0,
    "shared_W_phm": false,
    "shared_phm_rule": true,
    "use_gating": false
  },
  "config_id": "26cd1b10db746518",
  "hidden_size": 768,
  "model_class": "BartForConditionalGeneration",
  "model_name": "facebook/bart-base",
  "model_type": "bart",
  "name": "simplification",
  "version": "3.2.1"
}
@julianpollmann julianpollmann added the question Further information is requested label Aug 12, 2023
@adapter-hub-bert
Copy link
Member

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

@calpt calpt added bug Something isn't working and removed question Further information is requested Stale labels Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:encoder-decoder bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants