Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't recognize Chinese characters at all? #159

Open
AlanFranklin258 opened this issue Jan 10, 2025 · 2 comments
Open

Can't recognize Chinese characters at all? #159

AlanFranklin258 opened this issue Jan 10, 2025 · 2 comments

Comments

@AlanFranklin258
Copy link

I checked my configs, charset containing Chinese in particular, and ran train.py. The output model can't recognize Chinese characters at all.
I ran another friend's output model. It works in Chinese, but when I used its dataset and retrained, recognization failed again.
Appreciated for any advice.

@AlanFranklin258
Copy link
Author

When running test.py, I found that labels read by the model from test set(lmdb format) remove all Chinese characters which actually exist in test set as unicode format.
Now I am trying to follow #9 and disable unicode-normalization and retrain.

@AlanFranklin258
Copy link
Author

Following #9 and disabling unicode-normalization in the main.yaml really works! Still, I don't know how to deal with that test.py will get rid of Chinese characters when reading labels groundtruth. But it doesn't matter. Let test.py run for open dataset and rewrite read.py to test my own dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant