How do people like the idea of pretrained tokenizers? #275
Unanswered
chenmoneygithub
asked this question in
Ideas
Replies: 1 comment 1 reply
-
I think it's a good idea, especially for me working with Kurdish, which is a low-resource language. For tokenization, I have to do it from the beginning every time, so if there is such a thing (multi-lingual tokenizer) in Keras, it will be much easier. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We think it is helpful to provide multi-lingual tokenizers (WordPieceTokenizer, BytePairEncoder, SentencePieceTokenizer) pretrained from wiki datasets. Want to see how the community like the idea, or would people generally be more interested in tokenizers associated with pretrained models like BERT/GPT-2?
Beta Was this translation helpful? Give feedback.
All reactions