You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that when I split my PDF via Firefox to have a smaller PDF (e.g. first 10 pages), openparse wont extract any nodes. Original PDF gets extracted fine.
When I specify table_args, it will make parser return some nodes, but all are identified as a table.
I am attaching the PDF, perhaps someone could have a look what's wrong. concept-vp4360-cz.pdf
Example Code
No response
Python, open-parse & OS Version
python_version: 3.12.7
operating_system: Linux
os_version: 6.11.8-arch1-2
open-parse version: 0.7.0
python version: 3.12.7 (main, Oct 1 2024, 11:15:50) [GCC 14.2.1 20240910]
platform: Linux-6.11.8-arch1-2-x86_64-with-glibc2.40
related packages: torchvision-0.20.1 tokenizers-0.20.3 torch-2.5.1 pydantic-2.9.2 PyMuPDF-1.24.13 transformers-4.46.2
The text was updated successfully, but these errors were encountered:
Initial Checks
Description
I've noticed that when I split my PDF via Firefox to have a smaller PDF (e.g. first 10 pages), openparse wont extract any nodes. Original PDF gets extracted fine.
When I specify table_args, it will make parser return some nodes, but all are identified as a table.
I am attaching the PDF, perhaps someone could have a look what's wrong.
concept-vp4360-cz.pdf
Example Code
No response
Python, open-parse & OS Version
The text was updated successfully, but these errors were encountered: