Can't recognize Chinese characters at all? #159

AlanFranklin258 · 2025-01-10T02:55:58Z

I checked my configs, charset containing Chinese in particular, and ran train.py. The output model can't recognize Chinese characters at all.
I ran another friend's output model. It works in Chinese, but when I used its dataset and retrained, recognization failed again.
Appreciated for any advice.

AlanFranklin258 · 2025-01-13T07:03:32Z

When running test.py, I found that labels read by the model from test set(lmdb format) remove all Chinese characters which actually exist in test set as unicode format.
Now I am trying to follow #9 and disable unicode-normalization and retrain.

AlanFranklin258 · 2025-01-15T01:13:56Z

Following #9 and disabling unicode-normalization in the main.yaml really works! Still, I don't know how to deal with that test.py will get rid of Chinese characters when reading labels groundtruth. But it doesn't matter. Let test.py run for open dataset and rewrite read.py to test my own dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't recognize Chinese characters at all? #159

Can't recognize Chinese characters at all? #159

AlanFranklin258 commented Jan 10, 2025

AlanFranklin258 commented Jan 13, 2025

AlanFranklin258 commented Jan 15, 2025

Can't recognize Chinese characters at all? #159

Can't recognize Chinese characters at all? #159

Comments

AlanFranklin258 commented Jan 10, 2025

AlanFranklin258 commented Jan 13, 2025

AlanFranklin258 commented Jan 15, 2025