Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chat completion method #645

Merged
merged 31 commits into from
May 13, 2024
Merged

Add chat completion method #645

merged 31 commits into from
May 13, 2024

Conversation

radames
Copy link
Contributor

@radames radames commented May 1, 2024

Supersede #581. Thanks to @Wauplin, I can import the types from "@huggingface/tasks"
I've followed the pattern for textGeneration and textGenerationStream.

@radames radames requested review from coyotte508 and Wauplin May 1, 2024 04:37
@radames radames requested a review from vvmnnnkv as a code owner May 1, 2024 04:37
@radames radames mentioned this pull request May 1, 2024
@coyotte508 coyotte508 self-assigned this May 1, 2024
@radames
Copy link
Contributor Author

radames commented May 2, 2024

One question @coyotte508 , do we want to abstract the /v1/chat/completions endpoint if the user uses chatCompletion with our hosted models, let's say mistralai/Mistral-7B-Instruct-v0.2, shall this also work?

const stream = hf.chatCompletionStream({
				model: "mistralai/Mistral-7B-Instruct-v0.2",
				messages: [{ role: "user", content: "Complete the equation 1+1= ,just the answer" }],
				max_tokens: 500,
				return_full_text: false,
				temperature: 0.1,
				seed: 0,
			});

@radames
Copy link
Contributor Author

radames commented May 2, 2024

The only issue is now all models are served/compatible with /v1/chat/completions we might ended up on the same @Wauplin huggingface/huggingface_hub#2094

@Wauplin
Copy link
Contributor

Wauplin commented May 3, 2024

The only issue is now all models are served/compatible with /v1/chat/completions we might ended up on the same

@radames not quite exactly. TGI-served models expose a /v1/chat/completions route which is fully compatible with the chat completion API. transformers-served models on the other hand don't have a /v1/chat/completions route but they still accept to receive a list of messages as input in their text-generation pipeline. In this case messages are rendered server-side which makes figure from huggingface/huggingface_hub#2094 (comment) obsolete. I made the change in huggingface/huggingface_hub#2258 if you are interested to look at it. In particular you'll see that the transformers response is simply a text string since it is still the text-generation pipeline. To return a proper ChatCompletionOutput object, you'll have to populate fields manually (in my case a lot of fields are set to "dummy"). Hope this will help, please let me know if you have some remaining questions.

@Wauplin
Copy link
Contributor

Wauplin commented May 3, 2024

Oh and btw, TGI-served models have a /info route exposed (see https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-70B-Instruct/info). With this endpoint you can be 100% sure whether a model is TGI or transformers backed. In the current huggingface_hub implementation we rely on errors to "guess/infer" if a model is not TGI-served but don't do this 😬 I'll open a PR to fix this on hfh-side as well.

@radames
Copy link
Contributor Author

radames commented May 4, 2024

The only issue is now all models are served/compatible with /v1/chat/completions we might ended up on the same

@radames not quite exactly. TGI-served models expose a /v1/chat/completions route which is fully compatible with the chat completion API. transformers-served models on the other hand don't have a /v1/chat/completions route but they

thanks @Wauplin, yes I miswrote my sentence, and I wanted to say that not all models are served with mixedbread-ai/mxbai-embed-large-v1 exactly your point!

and I think we should copy this logic to complete the model url, in case ones provide only a model id

           model_url = self._resolve_url(model)
            if not model_url.endswith("/chat/completions"):
                model_url += "/v1/chat/completions"

Copy link
Contributor Author

@radames radames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've add chatCompletion hint, feel free to change if there is a better way to send that to the request call

packages/inference/test/HfInference.spec.ts Outdated Show resolved Hide resolved
@coyotte508
Copy link
Member

Can you update the tests in HfInference.spec.ts to use the chatCompletionStream method instead of streamingRequest?

(also check my README changes)

It should be good afterwards

@radames
Copy link
Contributor Author

radames commented May 8, 2024

Can you update the tests in HfInference.spec.ts to use the chatCompletionStream method instead of streamingRequest?

(also check my README changes)

It should be good afterwards

@coyotte508 for the external providers, it has to be streamingRequest since they don't support receiving extra arguments that comes with the chatCompletionStream , which sends options params, that are compatible with our serveless API and TGI

image

@coyotte508 coyotte508 dismissed their stale review May 9, 2024 11:55

Need to see exactly what options is sent to other endpoints and if they can be removed

README.md Outdated Show resolved Hide resolved
@radames
Copy link
Contributor Author

radames commented May 9, 2024

@coyotte508 here is the issue, when we use predefined tasks, such as textGeneration, which is compatible with our serveless API, https://huggingface.co/docs/api-inference/en/detailed_parameters, option parameter is valid. That's why on my example I use the custom request, since it won't send anything options

: JSON.stringify({
...(otherArgs.model && isUrl(otherArgs.model) ? omit(otherArgs, "model") : otherArgs),
...(otherOptions && !isObjectEmpty(otherOptions) && { options: otherOptions }),
}),

@coyotte508
Copy link
Member

coyotte508 commented May 10, 2024

@coyotte508 here is the issue, when we use predefined tasks, such as textGeneration, which is compatible with our serveless API, huggingface.co/docs/api-inference/en/detailed_parameters, option parameter is valid. That's why on my example I use the custom request, since it won't send anything options

: JSON.stringify({
...(otherArgs.model && isUrl(otherArgs.model) ? omit(otherArgs, "model") : otherArgs),
...(otherOptions && !isObjectEmpty(otherOptions) && { options: otherOptions }),
}),

Yes I just want to know what the value of otherOptions is - what options we send and if they're necessary or can be sent via headers instead.

We could also detect mistral/openai in the domain name and remove the options in that case

@radames
Copy link
Contributor Author

radames commented May 10, 2024

Great point. We might not need otherOptions. It seems like we can do it via headers, and we're already doing it.

@coyotte508
Copy link
Member

Great point. We might not need otherOptions. It seems like we can do it via headers, and we're already doing it.

@radames can you run tests again with VCR_MODE=cache to record the new tapes.json?

Hopefully it should be good afterwards

@radames
Copy link
Contributor Author

radames commented May 11, 2024

thanks @coyotte508 done! I ran the tests only for the new api VCR_MODE=cache pnpm exec vitest run -t "OpenAI Specs"

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay! this is close to being merged? i want to play with it:)

@radames
Copy link
Contributor Author

radames commented May 13, 2024

I think is just the final ok from @coyotte508 it's ready to be merged and released. I'm also waiting for this, to build a couple of things.

@coyotte508
Copy link
Member

Thanks @radames !

@coyotte508 coyotte508 merged commit f78bf7a into main May 13, 2024
5 checks passed
@coyotte508 coyotte508 deleted the chatCompletion branch May 13, 2024 09:06
mishig25 pushed a commit that referenced this pull request May 15, 2024
Thanks to #645, it is
now possible to use `chatCompletionStream` in Conversational Widget


![image](https://github.com/huggingface/huggingface.js/assets/11827707/efa05c3d-5a14-4564-9d50-40de25f0a21b)
@Wauplin Wauplin mentioned this pull request Jun 14, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants