-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix coverage penalty #125
Fix coverage penalty #125
Conversation
The error has moved, I now have this one: Traceback (most recent call last):
File "/usr/local/bin/eole", line 33, in <module>
sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')())
File "/workdir/eole/eole/bin/main.py", line 39, in main
bin_cls.run(args)
File "/workdir/eole/eole/bin/run/predict.py", line 42, in run
predict(config)
File "/workdireole/eole/bin/run/predict.py", line 18, in predict
_, _, _ = engine.infer_file()
File "/workdir/eole/eole/inference_engine.py", line 39, in infer_file
scores, estims, preds = self._predict(infer_iter)
File "/workdir/eole/eole/inference_engine.py", line 171, in _predict
scores, estims, preds = self.predictor._predict(
File "/workdir/eole/eole/predict/inference.py", line 473, in _predict
batch_data = self.predict_batch(batch, attn_debug)
File "/workdir/eole/eole/predict/generator.py", line 71, in predict_batch
return self._predict_batch_with_strategy(batch, decode_strategy)
File "/workdir/eole/eole/predict/generator.py", line 149, in _predict_batch_with_strategy
decode_strategy.advance(log_probs, attn)
File "/workdir/eole/eole/predict/beam_search.py", line 451, in advance
super(BeamSearchLM, self).advance(log_probs, attn)
File "/workdir/eole/eole/predict/beam_search.py", line 378, in advance
self.alive_attn = torch.cat([self.alive_attn, current_attn], 1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 357 but got size 358 for tensor number 1 in the list. |
When I comment the lines related to the alive_attn , it goes well |
In fact I have an error with batch_size > 1, related to the attention slicing with the (step1) tensor([0, 0, 2, 2, 4, 4, 6, 6, 8, 8], device='cuda:0') (step2) tensor([0, 0, 2, 2, 4, 4, 6, 7, 8, 8], device='cuda:0') Is this the expected behavior ? |
This works well even for batch_size > 1, but it does not handle coverage penalty along with return attention. |
Can you provide more details and/or a minimal example on how to reproduce the error? |
Maybe that I missed something .. I am using this config: transforms: [onmt_tokenize]
transforms_configs:
onmt_tokenize:
src_subword_type: bpe
src_subword_model: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct/bpe.model"
tgt_subword_type: bpe
tgt_subword_model: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct/bpe.model"
gpt2_pretok: true
# Model info
model_path: "${EOLE_MODEL_DIR}/llama3.1-8b-instruct"
# Inference
seed: 42
max_length: 1000
batch_type: sents
batch_size: 1
world_size: 1
gpu_ranks: [0]
compute_dtype: bfloat16
beam_size: 2
n_best: 1
report_time: true
src: None
coverage_penalty: 'wu'
beta: 1 |
Either I'm missing something or there are some specifities in your setup leading to this issue. root@eda4baa52803:/work/recipes/llama3.1# eole convert HF --model_dir meta-llama/Meta-Llama-3.1-8B-Instruct --output $EOLE_MODEL_DIR/llama3.1-8b-instruct --token $HF_TOKEN
starting output shard: 1/1
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
Loading model-00001-of-00004.safetensors
Loading model-00002-of-00004.safetensors
Loading model-00003-of-00004.safetensors
Loading model-00004-of-00004.safetensors
Saving output model shard: 0
You have a CUDA device, should run with -gpu_ranks
You have a CUDA device, should run with -gpu_ranks
root@eda4baa52803:/work/recipes/llama3.1# echo -e "What are some nice places to visit in France?" | sed ':a;N;$!ba;s/\n/⦅newline⦆/g' > test_prompt.txt
root@eda4baa52803:/work/recipes/llama3.1# eole predict -c test_coverage.yaml -src test_prompt.txt -output test_output.txt
[2024-10-08 12:57:16,640 INFO] Loading checkpoint from /mnt/ssd0/models/eole/llama3.1-8b-instruct
[2024-10-08 12:57:16,712 INFO] Building model...
[2024-10-08 12:57:16,863 INFO] Loading data into the model
[2024-10-08 12:57:27,988 INFO] Transforms applied: ['onmt_tokenize']
[2024-10-08 12:58:03,951 INFO] PRED SCORE: -0.3467, PRED PPL: 1.41 NB SENTENCES: 1
[2024-10-08 12:58:03,951 INFO] ESTIM SCORE: 1.0000, ESTIM PPL: 0.37 NB SENTENCES: 1
[2024-10-08 12:58:03,951 INFO] Total prediction time (s): 36.0
[2024-10-08 12:58:03,951 INFO] Average prediction time (ms): 35961.6
[2024-10-08 12:58:03,951 INFO] Tokens per second: 27.8
[2024-10-08 12:58:03,951 INFO] pred_words_total: 1000.0
Time w/o python interpreter load/terminate: 47.40990853309631 |
I ran your test and got the same error, then I reconverted the model and the error disappeared. |
When I use
In my decoding config, I have the following error