Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR-D workflow for slower processors reports errors #450

Open
stweil opened this issue Sep 12, 2024 · 1 comment
Open

OCR-D workflow for slower processors reports errors #450

stweil opened this issue Sep 12, 2024 · 1 comment

Comments

@stweil
Copy link
Collaborator

stweil commented Sep 12, 2024

I tried to apply the suggested workflow for slower processors. It failed in the last step:

20:42:49.364 INFO ocrd.task_sequence.run_tasks - Finished processing task 'cis-ocropy-dewarp -I OCR-D-SEG -O OCR-D-SEG-DEWARP -p '{"dpi": 0, "range": 4.0, "smoothness": 1.0, "max_neighbour": 0.05}''
20:42:49.436 INFO ocrd.task_sequence.run_tasks - Start processing task 'tesserocr-recognize -I OCR-D-SEG-DEWARP -O OCRD_SLOWER_PROCESSOR -p '{"textequiv_level": "glyph", "overwrite_segments": true, "model": "germa
n_print", "dpi": 0, "padding": 0, "segmentation_level": "word", "overwrite_text": true, "shrink_polygons": false, "block_polygons": false, "find_tables": true, "find_staves": false, "sparse_text": false, "raw_line
s": false, "char_whitelist": "", "char_blacklist": "", "char_unblacklist": "", "tesseract_parameters": {}, "xpath_parameters": {}, "xpath_model": {}, "auto_model": false, "oem": "DEFAULT"}''
20:44:00.802 ERROR ocrd.workspace.image_from_segment - segment "region0002_line0000" image (binarized,despeckled,binarized,dewarped; 2555x346) has not been cropped properly (2555x243)
20:44:00.968 ERROR ocrd.workspace.image_from_segment - segment "region0004_line0000" image (binarized,despeckled,binarized,dewarped; 135x122) has not been cropped properly (135x82)
20:44:01.130 ERROR ocrd.workspace.image_from_segment - segment "region0006_line0000" image (binarized,despeckled,binarized,dewarped; 1116x68) has not been cropped properly (1116x78)
20:44:01.287 ERROR ocrd.workspace.image_from_segment - segment "region0007_line0000" image (binarized,despeckled,binarized,dewarped; 907x66) has not been cropped properly (907x64)
20:44:01.600 ERROR ocrd.workspace.image_from_segment - segment "region0009_line0000" image (binarized,despeckled,binarized,dewarped; 663x74) has not been cropped properly (663x62)
20:44:01.874 ERROR ocrd.workspace.image_from_segment - segment "region0014_line0000" image (binarized,despeckled,binarized,dewarped; 300x162) has not been cropped properly (300x92)
Ignoring extant glyph: 549,1558 577,1557 577,1582 549,1583
20:44:01.979 ERROR ocrd.workspace.image_from_segment - segment "region0015_line0000" image (binarized,despeckled,binarized,dewarped; 1117x66) has not been cropped properly (1117x59)
20:44:02.114 ERROR ocrd.workspace.image_from_segment - segment "region0015_line0001" image (binarized,despeckled,binarized,dewarped; 1189x68) has not been cropped properly (1189x61)
[...]

Since I still don't know which step is the culprit, I report the error here.

See more details here.

@stweil
Copy link
Collaborator Author

stweil commented Sep 13, 2024

It looks like these "errors" should be marked as "warnings", because they are not fatal: PAGE XML with text results was created.

@stweil stweil changed the title OCR-D workflow for slower processors fails OCR-D workflow for slower processors reports errors Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant