You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that Openrouter now has a few provider which provide a lower quantization size than others, we need to ensure with our calls to openrouter that we are not going to mix those with multiple requests.
This is a derivation of the OpenAI-API which means we need to gate the feature for OpenRouter by either forking the openAI api library we use or switch the API for OpenRouter completely.
Also what we maybe should do is creating our own API for OpenAI by generating everything from the documentation itself, with that we can at least add such custom parts as the original API library does not provide the necessary Public methods to do that.
It seems that Openrouter now has a few provider which provide a lower quantization size than others, we need to ensure with our calls to openrouter that we are not going to mix those with multiple requests.
Providers with different quantization:
https://openrouter.ai/models/meta-llama/llama-3.1-8b-instruct/status
The text was updated successfully, but these errors were encountered: