-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tokenize & detokenize to client, fix typos #8
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for the contribution. I have a few nitpicks, please toch them up, before merging.
Design feedback: The API also allows you to fetch a tokenizer and instantiate it locally. Might this fit your usecase even better?
src/detokenization.rs
Outdated
} | ||
|
||
#[derive(Serialize, Debug)] | ||
struct DetokenizationRequest<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this to DetokenizationBody
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I renamed it to BodyDetokenization
and ResponseDetokenization
in the spirit of the existing BodyCompletion
and ResponseCompletion
.
@@ -215,6 +219,28 @@ impl Client { | |||
.output_of(&task.with_model(model), how) | |||
.await | |||
} | |||
|
|||
pub async fn tokenize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minimal example with a comment why you want to call this would be nice. Not required though for me to merge the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I added docstring examples for tokenize()/detokenize().
Thanks a lot for your review! Hope I could address your comments. I will add docstrings for tokenize/detokenize.
That's great. Just saw that the python client supports this (aleph_alpha_client.py#L573C55-L573C79) by fetching |
- implemented client code for the `/tokenize` & `/detokenize` endpoints - added docstring examples
Squashed the changes into a single commit. |
Added an implementation for the tasks
tokenize
&detokenize
analogous to the existing other API requestsWhile the Task + Output structs are easily extensible it feels a bit cumbersome for simple requests like tokenize/detokenize. Maybe it would be worth to consider offering simpler functions like
async fn detokenize(token_ids: &Vec<u32>) -> Result<String, Error>
.