-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early stopping in Hugging Face models #859
Comments
Early stopping for the ort beam search op is an attribute and not input. https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftbeamsearch In Olive, we already enable early stopping https://github.com/microsoft/Olive/blob/main/olive/passes/onnx/insert_beam_search.py#L145. For the full olive model that also has model pre-post processing using ort-extensions, it is already enabled https://github.com/microsoft/Olive/blob/main/olive/passes/utils/whisper_prepost.py#L17. |
@vymao do you have any follow up questions or comments? Or can we close this issue? |
I have the exact same problem. The problem is that it seems the model does not take into account the end of sequence token id (eos_token_id) at all. |
There is a new PR in the onnxruntime repo where the token_ids will be set explicitly while creating the beam search node microsoft/onnxruntime#19509 Previously, only a few token ids were set and the rest were inferred using hard-coded offsets. This did not work for all models since the vocabs are not always the same across the different variants of whisper. Once the PR is checked in, I will test the changes using the ort-nightly build. Will keep you posted once I get to try it out. Update: Please ignore the above. The issue is unrelated to the linked PR |
Yes, this is exactly what is happening. Once the model generates the EOS token id, early stopping is detected and the output is then automatically padded with the EOS token id until the max length is reached. This is done by design because the output shape is already predefined to the max length. The extra EOS token ids can easily be removed during post-processing. |
@kunal-vaishnavi thanks for the clarification! I was not aware of the padding behavior since the final model in olive uses a post processor which strips the special tokens. Then the ort PR I linked above the PR is unrelated to this issue since we always had the eos_token_id in the beam search node Olive/olive/passes/onnx/insert_beam_search.py Line 141 in b96fb29
|
Closing issue since early stopping is already enabled. Please see response from Kunal above for more clarification. |
I am trying to enable early stopping of models derived from Hugging Face - specifically, Whisper. I am curious if ONNX models generated via Olive respect this setting in the generation config, as it seems like if I set this
If I follow the conditional generation of Whisper on Hugging Face, I get the following:
As you can see, generation stops at the
<|endoftext|>
token, 50256.However, if I run the optimized Olive file in a C++ environment, I get something like:
The
<|endoftext|>
token seems to be continuously generated. I would have possibly expected the<|endoftext|>
token to be generated only once, as it becomes pointless to continue generation after that. I am not sure if the ONNX model already does this, and the extra<|endoftext|>
tokens are just padding?I have branched from the Whisper example provided in the examples folder and added the early stopping parameter to the dataset, so my inputs look like this:
Other information
The text was updated successfully, but these errors were encountered: