Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data provider for collation data with icu-collator #12

Open
kipcole9 opened this issue May 20, 2023 · 2 comments
Open

Data provider for collation data with icu-collator #12

kipcole9 opened this issue May 20, 2023 · 2 comments

Comments

@kipcole9
Copy link
Collaborator

CLDR collations are configured per-locale (typically per-language in reality) in a set of configuration files. These files need to be available to icu-collator through its data provider interface.

Including the data files in ex_cldr_collation seems reasonable. They are not large files since they represent only tailorings of the standard DUCET collation.

Questions

  1. Does icu-collator depend on other CLDR data than these collation files?
  2. Do any of the existing data provider mechanisms in icu-collator support loading these files. And if so, how is that configured?

I'll see what I can learn from reading more of the rust docs but I'm in deep water when it comes to that so any suggestions you have would be warmly welcomed!

@kipcole9
Copy link
Collaborator Author

Hmm it seems there must be some mechanism to process the collator data (and other CLDR data) into a rust format? This based upon https://github.com/unicode-org/icu4x/tree/main/provider/testdata/data/baked/collator/data_v1

Which would mean from a packaging point of view, processing all the collator locales and including them in the ex_cldr_collation lib would seem to make this workable. I can build a mix task to generate the data for each CLDR release so its somewhat automated.

@foxbenjaminfox
Copy link
Collaborator

I don't know if you saw the comment I left on #10; perhaps it wasn't a good idea to leave it as a comment on a closed PR. Either way: the core thing to look at I think is this guide, which describes how to use icu_datagen to generate the data for use with Rust's icu family of crates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants