You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having trouble exporting the Helsinki-NLP/opus-mt-es-en model for language translation into the optimised OpenVino IR format. Reading through the other issues within this repository highlighted this issue #188, which seems to suffer from similar effects.
In that case, it seemed to be an issue with the BigBird architecture and its lack of support by HuggingFace Optimum. However, the Helsinki-NLP/opus-mt-es-en model is of the MarianMT class, which is documented as being supported.
Am I missing something here fundamental? Is the conversion of the MarianMT model into OpenVino IR format currently unsupported by this library in a similar way to the BigBird models as in the above issue? Or are there aspects of the conversion that I am not specifying correctly such that the export is sub-optimal? It would seem that this should be possible given the documentation.
I see the following during the build logs if it helps at all: Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for 'decoder_input_ids'.
Operating run("Hola, como estas?") yields an inference time of 0.6323761940002441s while using the exported OVIR binaries in an OVMS model pipeline yields an inference time of 45s.
Any help on this one would be greatly appreciated, cheers!
P.S. I can post the config.json file being passed to the OVMS instance, but it's very long so I'll leave it until it's required!
The text was updated successfully, but these errors were encountered:
tsmith023
changed the title
OVModelForSeq2SeqLM for Helsinki-NLP/opus-mt-es-en slow inference times when exported to OpenVino
OVModelForSeq2SeqLM with Helsinki-NLP/opus-mt-es-en has slow inference times when exported to OpenVino
Jun 8, 2023
Apologies for the late reply, yes MarianMT models are supported. Concerning the slow inference you're reporting, are you comparing the resulting OpenVINO model with the original PyTorch model and currently finding that the latency from the OpenVINO model is higher?
I'm not able to reproduce this, could you confirm that you're still observing it with :
Hi @echarlaix, the problem didn't surface when executing within the Python runtime but when running the exported OVIR binaries within OpenVino itself, which is a C++ runtime. I was comparing the performance of the exported model within the Python runtime to its performance within the C++ runtime
Do you feel that this issue is better suited to the OpenVino repository? I raised it originally here since I judged it to be a problem with the model exporting logic. Let me know whether I should relocate it there or whether you feel there is an implementation issue here 😁
I'm having trouble exporting the
Helsinki-NLP/opus-mt-es-en
model for language translation into the optimised OpenVino IR format. Reading through the other issues within this repository highlighted this issue #188, which seems to suffer from similar effects.In that case, it seemed to be an issue with the BigBird architecture and its lack of support by HuggingFace Optimum. However, the
Helsinki-NLP/opus-mt-es-en
model is of theMarianMT
class, which is documented as being supported.Am I missing something here fundamental? Is the conversion of the
MarianMT
model into OpenVino IR format currently unsupported by this library in a similar way to the BigBird models as in the above issue? Or are there aspects of the conversion that I am not specifying correctly such that the export is sub-optimal? It would seem that this should be possible given the documentation.I see the following during the build logs if it helps at all:
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for 'decoder_input_ids'.
An MRE looks like:
Operating
run("Hola, como estas?")
yields an inference time of0.6323761940002441s
while using the exported OVIR binaries in an OVMS model pipeline yields an inference time of45s
.Any help on this one would be greatly appreciated, cheers!
P.S. I can post the
config.json
file being passed to the OVMS instance, but it's very long so I'll leave it until it's required!The text was updated successfully, but these errors were encountered: