Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch of parameters when using OpenAI Compatible API causing alternative tokens to not be returned? #101

Open
Mithrillion opened this issue Dec 8, 2024 · 5 comments

Comments

@Mithrillion
Copy link

Expected:
When using OpenAI Compatible API, you should be able to obtain top 10 alternative tokens and their probabilities as when using llama.cpp API.

Observed:
Whenever using OpenAI Compatible API (tested locally with oobabooga and also remotely with OpenRouter), top token options are never returned.

Possible reason:
Mikupad is trying to pass logprobs: 10 in the completion requests, but as per OpenAI documentation (https://openrouter.ai/docs/requests#request-headers) and this OpenRouter example, the correct parameter name should be top_logprobs, whereas logprobs is a boolean variable serving as a switch. I think maybe the problem is when using OpenAI Compatible API, this parameter is not recognised on the server side due to this mismatch.

@Mithrillion
Copy link
Author

Mithrillion commented Dec 8, 2024

Additional note: it seems most models on OpenRouter actually do not return logprobs anyway, but some do, like the o4 models. However, even with these models, the logprobs are not being displayed in Mikupad. I tested the API and it does return a list top token logprobs. I have yet to test whether streaming vs full message produce different results though.

@lmg-anon
Copy link
Owner

Whenever using OpenAI Compatible API (tested locally with oobabooga and also remotely with OpenRouter), top token options are never returned.

Are you sure you're using an hf_* model loader in oobabooga? Unless it stopped working recently, the top tokens used to work correctly very recently.

the correct parameter name should be top_logprobs

top_logprobs is only used for the chat completion API; the text completion API uses logprobs as per the OpenAI API reference.

Additional note: it seems most models on OpenRouter actually do not return logprobs anyway, but some do, like the o4 models. However, even with these models, the logprobs are not being displayed in Mikupad. I tested the API and it does return a list top token logprobs. I have yet to test whether streaming vs full message produce different results though.

You mean, using the text completion API and top_logprobs in the request? If that's the case, I think we could always send the top_logprobs field as well; it shouldn't hurt the other backends.

@Mithrillion
Copy link
Author

Mithrillion commented Dec 15, 2024

Are you sure you're using an hf_* model loader in oobabooga? Unless it stopped working recently, the top tokens used to work correctly very recently.

I was not aware only certain model loaders may pass through the top token information. I assumed the API would automatically provide identical functionality as the underlying llama.cpp API. I need to double check.

top_logprobs is only used for the chat completion API; the text completion API uses logprobs as per the OpenAI API reference.

This is quite confusing but you are right.

You mean, using the text completion API and top_logprobs in the request? If that's the case, I think we could always send the top_logprobs field as well; it shouldn't hurt the other backends.

I had a look at the response I get when directly I curl the API endpoint. The response does contain all the top token information. I wonder if this response format conforms with what mikupad is expecting?

test_response.json

The corresponding request is like this:

curl https://openrouter.ai/api/v1/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer xxx" \ -d '{ "model": "openai/gpt-4o-2024-11-20", "prompt": "Tell me a short joke.", "temperature": 0.8, "logprobs": true, "top_logprobs": 10, }'

I think OpenRouter may have rerouted traffic to chat completion...

@lmg-anon
Copy link
Owner

lmg-anon commented Dec 16, 2024

I think OpenRouter may have rerouted traffic to chat completion...

I can confirm that OpenRouter is faking text completion by using the chat API for models like gpt4o (which are available only through a chat API), and that's what is causing this issue.
The only way I see to solve this would be to add a checkbox like "force chat API compat" in the UI, but this feels far from ideal. The best thing to do seems to be to finish the Chat API PR instead.

@Mithrillion
Copy link
Author

Mithrillion commented Jan 7, 2025

It seems when switching to Chat Completion API in the configurations, now the alternative tokens will show properly for models with top tokens support. However I am not able to click on the alternative tokens and re-generate from there. Is this a limitation of the API or fixable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants