-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chinese traineddata does not use ISO 639-3 code #28
Comments
AFAIK Never rely on one source (especially if anybody can change it). |
Here is the official site, I should have probably linked to that instead of wikipedia in the first place. While yes, |
The Tesseract documentation is not clear about the question whether ISO 639-2 or ISO 639-3 is used:
I don't mind fixing this and using ISO 639-3 everywhere, but would like to get more feedback here from people who are affected by a renaming of the models for Chinese. |
Thats really unfortunate, that tesseract it self isnt even consistent with the used standard. |
There exist more ISO 639-3 codes for Chinese variants (at least cmn, yue, nan). |
Yes those are for variants of the language although Id argue that the model is applicable for the macro language chinese hence |
So the suggestion is that the OCR models starting with Comments from Chinese users of Tesseract are welcome. |
The chinese models
chi_sim
,chi_sim_vert
,chi_tra
andchi_tra_vert
usechi
instead ofzho
which is the ISO 639-3 standard.Since all other files are named according to this standard I dont see a reason why this should not be the case for chinese.
The text was updated successfully, but these errors were encountered: