Skip to content
This repository has been archived by the owner on Jan 7, 2023. It is now read-only.

CER always bigger than 1 #6

Open
IAASSIBLCU opened this issue Dec 8, 2021 · 4 comments
Open

CER always bigger than 1 #6

IAASSIBLCU opened this issue Dec 8, 2021 · 4 comments

Comments

@IAASSIBLCU
Copy link

help!!!
I try to reproduce this paper, but met some trouble in the training stage. hope to get some suggestion.
background:
Training data set: CASIA_HWDW2.X train dataset (image type is jpg, txt file is written utf-8, windows)
modification in the code: change warp-ctc with PyTorch CTC
problem:
from the paper, we can find the CER will be less than 0.5. However, when I try to training this model and set the batch size in 8,
the CER always be 1.7, and if I set the batch size in 4, the CER always be 1 even it has trained more than 10 epoch.

I have no idea how to with this problem. have anyone reproduce this paper successfully. could you give some suggestions, like the methods of data preprocessing.

image

@bliu3650
Copy link
Contributor

help!!! I try to reproduce this paper, but met some trouble in the training stage. hope to get some suggestion. background: Training data set: CASIA_HWDW2.X train dataset (image type is jpg, txt file is written utf-8, windows) modification in the code: change warp-ctc with PyTorch CTC problem: from the paper, we can find the CER will be less than 0.5. However, when I try to training this model and set the batch size in 8, the CER always be 1.7, and if I set the batch size in 4, the CER always be 1 even it has trained more than 10 epoch.

I have no idea how to with this problem. have anyone reproduce this paper successfully. could you give some suggestions, like the methods of data preprocessing.

image

Thanks for your interest. :)

First of all, I would comment that this work was developed during the time of my previous job, so I could not share the environment information any more. But in my current spare time, I am trying to improve the repeatability on my own develop machine.

For your experiment, glad to see you've trained it up. From my experience, I would suggest that:

  • You'd better start from the warp-ctc (Long time ago, we tried native pytorch ctc, but the result was not that good as warp-ctc, especially for long sentences.), but of course, this could not be the root reason for your current result.
  • Please check the training stage first. If you can observe the normal training, the test should be the same. As we can see that, the loss after 5 epochs has already been decreased to 20+, this should be good. But the prediction results during the training are STILL in random order. FYI, in our previous experiments, after 5 epochs, the CER could close to 0.15.
  • Double check the test/val dataset, especially the dictionary list and the mapping from images to the labels.

In a word, the CER issue in you post probably caused by the decoding process, which is highly related to the organization of the dataset. Hope these information would help you. Thanks.

@IAASSIBLCU
Copy link
Author

Thanks for your help, your suggestions are really helpful. I have trained it up and solved this problem successfully. The root reason is native pytorch ctc. I change it with warp-ctc. What’s more, I moved preds and CTCloss from GPU into CPU. Because the value of ctcloss always be 0 if the preds was in GPU. It seems a native bug of warp-ctc, which many people find in here:
SeanNaren/warp-ctc#102
SeanNaren/warp-ctc#59

Thanks again!!! Have a good day.

@bliu3650
Copy link
Contributor

Actually, the warpctc can run on GPU, but need keep refreshing the compilation along with the upgrade of CUDA version. For users who need CUDA11+, please refer to below link to recompile the warpctc. Thanks.

SeanNaren/warp-ctc/issues/182

@Qcosmo
Copy link

Qcosmo commented May 28, 2022

This reply is quite useful to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants