-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Public Rust API #28
Comments
Names and semantics should be close to Java version as possible. |
TL:DRNice API with multiple sentences is currently blocked in stable Rust by in-progress GATs feature, also see http://lukaskalbertodt.github.io/2018/08/03/solving-the-generalized-streaming-iterator-problem-without-gats.html. Want to have:
Problems:
What to do
|
Splitting API into sentence splitter / analysis
|
Morpheme's part_of_speech should not return option of POS array, it should panic when given invalid POS id instead. |
We want to design public API so Sudachi would be usable like the following.
Syntax can be a bit invalid and all names are open for discussion.
Key points of API
Because of Python API and lifetime considerations, Model should be a thin wrapper on
Arc<RealModel>
or something like that.Layering
We have Rust API and Python API with different lifetime considerations.
Rust API should use lifetimes to safeguard against misuse and use mostly references for sharing data. On the other hand Python can't use Rust lifetimes and should use mostly
Arc
for sharing data.Design proposal here is to have pointer-generic internals with thin wrappers for API types which mostly exist for instantiating concrete types.
API Surface (Types)
Dictionary
- stores immutable data for tokenizationTokenizer
- stores mutable state for tokenizationInputBuffer
- handles zero-copy input, sentence splitting and streaming of input data (eventually)MorphemeList
- analysis result of a single block of input dataMorpheme
- unit of analysis resultThe text was updated successfully, but these errors were encountered: