tesseract picture export format #4290

bruzzler5 · 2024-08-05T14:39:45Z

Your Feature Request

Hello,

I'm OCRizing extremely compressed pdfs from NARA. When I'm using any of the offered export formats for the pictures, I get file sizes of 5-10 times of the original. I didn't find an option in tesseract to export the pictures in the format of the un-OCRized pdf. Any solution or by-pass in tesseract? Maybe this is possible with the (external) hOCR-tools?

zdenop · 2024-08-05T16:26:30Z

Use the tesseract user forum for asking questions.

stweil · 2024-08-05T18:29:27Z

Related PR: #4171. See discussion there. Tesseract would require code modifications which transfer the image from the input PDF to the output PDF without changing the format. If someone implements this, a pull request is welcome.

zdenop closed this as completed Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tesseract picture export format #4290

tesseract picture export format #4290

bruzzler5 commented Aug 5, 2024

zdenop commented Aug 5, 2024

stweil commented Aug 5, 2024 •

edited

Loading

tesseract picture export format #4290

tesseract picture export format #4290

Comments

bruzzler5 commented Aug 5, 2024

Your Feature Request

zdenop commented Aug 5, 2024

stweil commented Aug 5, 2024 • edited Loading

stweil commented Aug 5, 2024 •

edited

Loading