-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero height box found, cannot convert properly #33
Comments
That's strange, that will only happen when detected bboxes are 0 or less height (something weird with the pdf). Can you share the pdfs you're trying? |
@VikParuchuri |
Thanks for the example, will take a look tomorrow. |
Unfortunately I cannot as they are sensitive - I am getting the same issue with the pdf that @Blacktothefuture posted though |
TIL that pdfs can be rotated, and the coordinates of the bboxes for the text will not be rotated accordingly. Basically, this bug is due to trying to convert pdfs that have had pages rotated. I'm looking into a fix. |
good find!! |
@theesfeld @Blacktothefuture I've pushed a fix to the dev branch. It works with the pdf example above. This needs more testing with a range of pdfs to make sure it works properly (and doesn't cause issues with other pdfs) before I merge it. |
can I privately send you a pdf I am having issues with? |
Sure, you can email [email protected] or join the about-to-launch discord and DM me (https://discord.gg//KuZwXNGnfH) |
If anyone has bandwidth to test the fix currently on the |
I've merged the PR, so am closing this. Please re-open if you notice any issues. |
Hey @VikParuchuri |
Every single PDF I have tried gets the following error:
macOS 14.2
python 3.9
installed via exact instructions from git repo
The text was updated successfully, but these errors were encountered: