Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix coverage penalty #125

Closed
wants to merge 2 commits into from

Conversation

l-k-11235
Copy link
Contributor

When I use

beam_size: 2
coverage_penalty: 'wu'
beta: 1

In my decoding config, I have the following error

Traceback (most recent call last):
  File "/usr/local/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')())
  File "/workdir/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "/workdir/eole/eole/bin/run/predict.py", line 42, in run
    predict(config)
  File "/workdir/eole/eole/bin/run/predict.py", line 18, in predict
    _, _, _ = engine.infer_file()
  File "/workdir/eole/eole/inference_engine.py", line 38, in infer_file
    scores, estims, preds = self._predict(infer_iter)
  File "/workdir/eole/eole/inference_engine.py", line 170, in _predict
    scores, estims, preds = self.predictor._predict(
  File "/workdir/eole/eole/predict/inference.py", line 475, in _predict
    batch_data = self.predict_batch(batch, attn_debug)
  File "/workdir/eole/eole/predict/generator.py", line 71, in predict_batch
    return self._predict_batch_with_strategy(batch, decode_strategy)
  File "/workdir/eole/eole/predict/generator.py", line 149, in _predict_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 437, in advance
    super(BeamSearchLM, self).advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 383, in advance
    self.topk_scores -= cov_penalty.view(_B, self.beam_size).float()
RuntimeError: shape '[1, 2]' is invalid for input of size 518

@l-k-11235
Copy link
Contributor Author

The error has moved, I now have this one:

Traceback (most recent call last):
  File "/usr/local/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')())
  File "/workdir/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "/workdir/eole/eole/bin/run/predict.py", line 42, in run
    predict(config)
  File "/workdireole/eole/bin/run/predict.py", line 18, in predict
    _, _, _ = engine.infer_file()
  File "/workdir/eole/eole/inference_engine.py", line 39, in infer_file
    scores, estims, preds = self._predict(infer_iter)
  File "/workdir/eole/eole/inference_engine.py", line 171, in _predict
    scores, estims, preds = self.predictor._predict(
  File "/workdir/eole/eole/predict/inference.py", line 473, in _predict
    batch_data = self.predict_batch(batch, attn_debug)
  File "/workdir/eole/eole/predict/generator.py", line 71, in predict_batch
    return self._predict_batch_with_strategy(batch, decode_strategy)
  File "/workdir/eole/eole/predict/generator.py", line 149, in _predict_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 451, in advance
    super(BeamSearchLM, self).advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 378, in advance
    self.alive_attn = torch.cat([self.alive_attn, current_attn], 1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 357 but got size 358 for tensor number 1 in the list.

@l-k-11235
Copy link
Contributor Author

When I comment the lines related to the alive_attn , it goes well

@l-k-11235
Copy link
Contributor Author

In fact I have an error with batch_size > 1, related to the attention slicing with the select_indices.
I have printed the select_indices with beam size: 2 and batch size: 5, And I get:

(step1)

tensor([0, 0, 2, 2, 4, 4, 6, 6, 8, 8], device='cuda:0')

(step2)

tensor([0, 0, 2, 2, 4, 4, 6, 7, 8, 8], device='cuda:0')

Is this the expected behavior ?

@l-k-11235
Copy link
Contributor Author

l-k-11235 commented Oct 7, 2024

This works well even for batch_size > 1, but it does not handle coverage penalty along with return attention.
In fact, I don't really understand why the return attention path is linked to the coverage penalty path. Do you have any idea?

@francoishernandez
Copy link
Contributor

Can you provide more details and/or a minimal example on how to reproduce the error?
Enabling -coverage_penalty wu -beta 1 on a few setups does not seem to raise any issue on my end.
Also, we can probably close #119 if this one is supposed to replace it.

@l-k-11235
Copy link
Contributor Author

l-k-11235 commented Oct 8, 2024

Maybe that I missed something .. I am using this config:

transforms:  [onmt_tokenize]
transforms_configs:
    onmt_tokenize:
        src_subword_type: bpe
        src_subword_model: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct/bpe.model"
        tgt_subword_type: bpe
        tgt_subword_model: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct/bpe.model"
        gpt2_pretok: true

# Model info
model_path: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct"

# Inference
seed: 42
max_length:  1000 
batch_type: sents
batch_size: 1
world_size: 1
gpu_ranks: [0]
compute_dtype: bfloat16
beam_size: 2
n_best: 1
report_time: true
src: None
coverage_penalty: 'wu'
beta: 1

@francoishernandez
Copy link
Contributor

Either I'm missing something or there are some specifities in your setup leading to this issue.
The config you provided is not sufficient on its own to reproduce.
See successfully running code below, where test_coverage.yaml is pasted from your config.

root@eda4baa52803:/work/recipes/llama3.1# eole convert HF --model_dir meta-llama/Meta-Llama-3.1-8B-Instruct --output $EOLE_MODEL_DIR/llama3.1-8b-instruct --token $HF_TOKEN
starting output shard: 1/1
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
Loading model-00001-of-00004.safetensors
Loading model-00002-of-00004.safetensors
Loading model-00003-of-00004.safetensors
Loading model-00004-of-00004.safetensors
Saving output model shard: 0
You have a CUDA device, should run with -gpu_ranks
You have a CUDA device, should run with -gpu_ranks
root@eda4baa52803:/work/recipes/llama3.1# echo -e "What are some nice places to visit in France?" | sed ':a;N;$!ba;s/\n/⦅newline⦆/g' > test_prompt.txt
root@eda4baa52803:/work/recipes/llama3.1# eole predict -c test_coverage.yaml -src test_prompt.txt -output test_output.txt
[2024-10-08 12:57:16,640 INFO] Loading checkpoint from /mnt/ssd0/models/eole/llama3.1-8b-instruct
[2024-10-08 12:57:16,712 INFO] Building model...
[2024-10-08 12:57:16,863 INFO] Loading data into the model
[2024-10-08 12:57:27,988 INFO] Transforms applied: ['onmt_tokenize']
[2024-10-08 12:58:03,951 INFO] PRED SCORE: -0.3467, PRED PPL: 1.41 NB SENTENCES: 1
[2024-10-08 12:58:03,951 INFO] ESTIM SCORE: 1.0000, ESTIM PPL: 0.37 NB SENTENCES: 1
[2024-10-08 12:58:03,951 INFO] Total prediction time (s): 36.0
[2024-10-08 12:58:03,951 INFO] Average prediction time (ms): 35961.6
[2024-10-08 12:58:03,951 INFO] Tokens per second: 27.8
[2024-10-08 12:58:03,951 INFO] pred_words_total: 1000.0
Time w/o python interpreter load/terminate:  47.40990853309631

@l-k-11235
Copy link
Contributor Author

I ran your test and got the same error, then I reconverted the model and the error disappeared.
So that means I have to readjust my model to be able to apply the penalty coverage with the main branch, but it's not a big deal. Thanks for your help.

@l-k-11235 l-k-11235 closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants