Fix coverage penalty #125

l-k-11235 · 2024-10-04T13:39:22Z

When I use

beam_size: 2
coverage_penalty: 'wu'
beta: 1

In my decoding config, I have the following error

Traceback (most recent call last):
  File "/usr/local/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')())
  File "/workdir/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "/workdir/eole/eole/bin/run/predict.py", line 42, in run
    predict(config)
  File "/workdir/eole/eole/bin/run/predict.py", line 18, in predict
    _, _, _ = engine.infer_file()
  File "/workdir/eole/eole/inference_engine.py", line 38, in infer_file
    scores, estims, preds = self._predict(infer_iter)
  File "/workdir/eole/eole/inference_engine.py", line 170, in _predict
    scores, estims, preds = self.predictor._predict(
  File "/workdir/eole/eole/predict/inference.py", line 475, in _predict
    batch_data = self.predict_batch(batch, attn_debug)
  File "/workdir/eole/eole/predict/generator.py", line 71, in predict_batch
    return self._predict_batch_with_strategy(batch, decode_strategy)
  File "/workdir/eole/eole/predict/generator.py", line 149, in _predict_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 437, in advance
    super(BeamSearchLM, self).advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 383, in advance
    self.topk_scores -= cov_penalty.view(_B, self.beam_size).float()
RuntimeError: shape '[1, 2]' is invalid for input of size 518

l-k-11235 · 2024-10-04T13:50:36Z

The error has moved, I now have this one:

Traceback (most recent call last):
  File "/usr/local/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')())
  File "/workdir/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "/workdir/eole/eole/bin/run/predict.py", line 42, in run
    predict(config)
  File "/workdireole/eole/bin/run/predict.py", line 18, in predict
    _, _, _ = engine.infer_file()
  File "/workdir/eole/eole/inference_engine.py", line 39, in infer_file
    scores, estims, preds = self._predict(infer_iter)
  File "/workdir/eole/eole/inference_engine.py", line 171, in _predict
    scores, estims, preds = self.predictor._predict(
  File "/workdir/eole/eole/predict/inference.py", line 473, in _predict
    batch_data = self.predict_batch(batch, attn_debug)
  File "/workdir/eole/eole/predict/generator.py", line 71, in predict_batch
    return self._predict_batch_with_strategy(batch, decode_strategy)
  File "/workdir/eole/eole/predict/generator.py", line 149, in _predict_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 451, in advance
    super(BeamSearchLM, self).advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 378, in advance
    self.alive_attn = torch.cat([self.alive_attn, current_attn], 1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 357 but got size 358 for tensor number 1 in the list.

l-k-11235 · 2024-10-04T13:55:40Z

When I comment the lines related to the alive_attn , it goes well

l-k-11235 · 2024-10-07T14:40:06Z

In fact I have an error with batch_size > 1, related to the attention slicing with the select_indices.
I have printed the select_indices with beam size: 2 and batch size: 5, And I get:

(step1)

tensor([0, 0, 2, 2, 4, 4, 6, 6, 8, 8], device='cuda:0')

(step2)

tensor([0, 0, 2, 2, 4, 4, 6, 7, 8, 8], device='cuda:0')

Is this the expected behavior ?

l-k-11235 · 2024-10-07T16:05:23Z

This works well even for batch_size > 1, but it does not handle coverage penalty along with return attention.
In fact, I don't really understand why the return attention path is linked to the coverage penalty path. Do you have any idea?

francoishernandez · 2024-10-08T09:04:20Z

Can you provide more details and/or a minimal example on how to reproduce the error?
Enabling -coverage_penalty wu -beta 1 on a few setups does not seem to raise any issue on my end.
Also, we can probably close #119 if this one is supposed to replace it.

l-k-11235 · 2024-10-08T12:26:52Z

Maybe that I missed something .. I am using this config:

transforms:  [onmt_tokenize]
transforms_configs:
    onmt_tokenize:
        src_subword_type: bpe
        src_subword_model: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct/bpe.model"
        tgt_subword_type: bpe
        tgt_subword_model: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct/bpe.model"
        gpt2_pretok: true

# Model info
model_path: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct"

# Inference
seed: 42
max_length:  1000 
batch_type: sents
batch_size: 1
world_size: 1
gpu_ranks: [0]
compute_dtype: bfloat16
beam_size: 2
n_best: 1
report_time: true
src: None
coverage_penalty: 'wu'
beta: 1

francoishernandez · 2024-10-08T13:00:52Z

Either I'm missing something or there are some specifities in your setup leading to this issue.
The config you provided is not sufficient on its own to reproduce.
See successfully running code below, where test_coverage.yaml is pasted from your config.

root@eda4baa52803:/work/recipes/llama3.1# eole convert HF --model_dir meta-llama/Meta-Llama-3.1-8B-Instruct --output $EOLE_MODEL_DIR/llama3.1-8b-instruct --token $HF_TOKEN
starting output shard: 1/1
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
Loading model-00001-of-00004.safetensors
Loading model-00002-of-00004.safetensors
Loading model-00003-of-00004.safetensors
Loading model-00004-of-00004.safetensors
Saving output model shard: 0
You have a CUDA device, should run with -gpu_ranks
You have a CUDA device, should run with -gpu_ranks
root@eda4baa52803:/work/recipes/llama3.1# echo -e "What are some nice places to visit in France?" | sed ':a;N;$!ba;s/\n/｟newline｠/g' > test_prompt.txt
root@eda4baa52803:/work/recipes/llama3.1# eole predict -c test_coverage.yaml -src test_prompt.txt -output test_output.txt
[2024-10-08 12:57:16,640 INFO] Loading checkpoint from /mnt/ssd0/models/eole/llama3.1-8b-instruct
[2024-10-08 12:57:16,712 INFO] Building model...
[2024-10-08 12:57:16,863 INFO] Loading data into the model
[2024-10-08 12:57:27,988 INFO] Transforms applied: ['onmt_tokenize']
[2024-10-08 12:58:03,951 INFO] PRED SCORE: -0.3467, PRED PPL: 1.41 NB SENTENCES: 1
[2024-10-08 12:58:03,951 INFO] ESTIM SCORE: 1.0000, ESTIM PPL: 0.37 NB SENTENCES: 1
[2024-10-08 12:58:03,951 INFO] Total prediction time (s): 36.0
[2024-10-08 12:58:03,951 INFO] Average prediction time (ms): 35961.6
[2024-10-08 12:58:03,951 INFO] Tokens per second: 27.8
[2024-10-08 12:58:03,951 INFO] pred_words_total: 1000.0
Time w/o python interpreter load/terminate:  47.40990853309631

l-k-11235 · 2024-10-08T13:28:35Z

I ran your test and got the same error, then I reconverted the model and the error disappeared.
So that means I have to readjust my model to be able to apply the penalty coverage with the main branch, but it's not a big deal. Thanks for your help.

fixed coverage size

5e8b3ff

use alive_attn and select_indices only if return attn

642db27

l-k-11235 closed this Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix coverage penalty #125

Fix coverage penalty #125

l-k-11235 commented Oct 4, 2024

l-k-11235 commented Oct 4, 2024

l-k-11235 commented Oct 4, 2024

l-k-11235 commented Oct 7, 2024

l-k-11235 commented Oct 7, 2024 •

edited

Loading

francoishernandez commented Oct 8, 2024

l-k-11235 commented Oct 8, 2024 •

edited

Loading

francoishernandez commented Oct 8, 2024

l-k-11235 commented Oct 8, 2024

Fix coverage penalty #125

Fix coverage penalty #125

Conversation

l-k-11235 commented Oct 4, 2024

l-k-11235 commented Oct 4, 2024

l-k-11235 commented Oct 4, 2024

l-k-11235 commented Oct 7, 2024

l-k-11235 commented Oct 7, 2024 • edited Loading

francoishernandez commented Oct 8, 2024

l-k-11235 commented Oct 8, 2024 • edited Loading

francoishernandez commented Oct 8, 2024

l-k-11235 commented Oct 8, 2024

l-k-11235 commented Oct 7, 2024 •

edited

Loading

l-k-11235 commented Oct 8, 2024 •

edited

Loading