Flan-T5 small converted model produces wrong result with batch size > 1 and long senetences #21053
Labels
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
Finetuned Flan-T5 small for spellchecking was converted with
https://github.com/microsoft/onnxruntime/blob/8f0e896c95eedee624b6a3c375e86e1e4263a980/onnxruntime/python/tools/transformers/convert_generation.py
Using:
python convert_generation.py -m t5-small --model_type t5 --output ./models/t5/onnx_models/t5_small_beam_search.onnx --use_gpu --past_present_share_buffer --use_decoder_masked_attention
It works perfectly with batch size 1, however upon increasing batch size with some samples output is incorrect
Why does length of other samples/batch size impacts output?
To reproduce
Infer using:
onnxruntime/onnxruntime/python/tools/transformers/convert_generation.py
Line 648 in 8f0e896
Incorrect output for 2nd sample:
While using input as
text = ['Obierika, Okonkwo best friend even stated.', 'loving it.']
While using input as
text = [''loving it.']
Urgency
No response
Platform
Linux
OS Version
Debian 11 5.10.0-30-cloud-amd64
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.18.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.2
The text was updated successfully, but these errors were encountered: