Skip to content

Commit

Permalink
Add SentencePieceTokenizer and GPT2Tokenizer examples for Extensions …
Browse files Browse the repository at this point in the history
…Python converter API docs (#17708)

Co-authored-by: Sayan Shaw <[email protected]>
  • Loading branch information
sayanshaw24 and Sayan Shaw authored Oct 5, 2023
1 parent 1ad850f commit fb7a2e0
Showing 1 changed file with 8 additions and 3 deletions.
11 changes: 8 additions & 3 deletions docs/extensions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,16 @@ If the pre processing operator is a HuggingFace tokenizer, you can also easily g
```python
import onnxruntime as _ort
from transformers import AutoTokenizer
from transformers import AutoTokenizer, GPT2Tokenizer
from onnxruntime_extensions import OrtPyFunction, gen_processing_models
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
model = OrtPyFunction(gen_processing_models(tokenizer, pre_kwargs={})[0])
# SentencePieceTokenizer
spm_hf_tokenizer = AutoTokenizer.from_pretrained("t5-base", model_max_length=512)
spm_onnx_model = OrtPyFunction(gen_processing_models(spm_hf_tokenizer, pre_kwargs={})[0])
# GPT2Tokenizer
gpt2_hf_tokenizer = GPT2Tokenizer.from_pretrained("Xenova/gpt-4", use_fast=False)
gpt2_onnx_model = OrtPyFunction(gen_processing_models(gpt2_hf_tokenizer, pre_kwargs={})[0])
```

For more information, you can check the API using the following:
Expand Down

0 comments on commit fb7a2e0

Please sign in to comment.