You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm OCRizing extremely compressed pdfs from NARA. When I'm using any of the offered export formats for the pictures, I get file sizes of 5-10 times of the original. I didn't find an option in tesseract to export the pictures in the format of the un-OCRized pdf. Any solution or by-pass in tesseract? Maybe this is possible with the (external) hOCR-tools?
The text was updated successfully, but these errors were encountered:
Related PR: #4171. See discussion there. Tesseract would require code modifications which transfer the image from the input PDF to the output PDF without changing the format. If someone implements this, a pull request is welcome.
Your Feature Request
Hello,
I'm OCRizing extremely compressed pdfs from NARA. When I'm using any of the offered export formats for the pictures, I get file sizes of 5-10 times of the original. I didn't find an option in tesseract to export the pictures in the format of the un-OCRized pdf. Any solution or by-pass in tesseract? Maybe this is possible with the (external) hOCR-tools?
The text was updated successfully, but these errors were encountered: