-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented computation of probability matrix #279
base: master
Are you sure you want to change the base?
Conversation
Is your code complete? It looks that the variables Can you give more information about the output format? I see that the files have always 156 lines with several probalities, but none of these values seem to be equal the ones which are outputed with |
https://github.com/tmbdev/ocropy/wiki/OCRopus-File-Formats#lattice-files |
@amitdo The outputed files look differently. Here is an example: |
His patch just outputs the raw result of the prediction. What you see with the current (without this parch) text/prob. options is the 'best' path that translate_back() found for you. The format in my link is more human readable. |
Related: #25 |
The number of lines (156) is the size of the codec (chars) in the model you use. |
Okay, I don't think that this matrix is then enough interesting for an option to |
There is also the |
This is intended to be an extension of the --probabilities.
Instead of just printing the probabilities for the recognised characters, --probmat will compute the complete probability matrix.
At each "timestep" the probability for each character is computed.
This can/could be used as input to a language model for example where one would have access to the probabilities of other characters as well.