Leverage word bbox from pdf-parser-v2 in the layout- and table-model #285

PeterStaar-IBM · 2024-11-09T07:18:31Z

Requested feature

We have much finer grained bbox information using the docling-parse-v2, which could be easily leveraged by layout and table model for improved accuracy.

maxmnemonic · 2024-11-12T17:18:04Z

Working on implementation, it will require some refactoring of page_preprocessing_model as well as table_structure_model.

aborruso · 2024-11-12T18:15:48Z

Working on implementation, it will require some refactoring of page_preprocessing_model as well as table_structure_model.

thank you very much

PeterStaar-IBM · 2024-11-16T07:53:23Z

@maxmnemonic you can leverage this new feature in docling-parse (DS4SD/docling-parse#57)

PeterStaar-IBM added the enhancement New feature or request label Nov 9, 2024

PeterStaar-IBM assigned maxmnemonic Nov 9, 2024

dolfim-ibm added the PDF parsing label Nov 11, 2024

This was referenced Nov 11, 2024

Enhanced Table Extraction for Complex Formats #280

Open

For long tables, fields are being truncated #278

Open

cli and PDF: wrong table output #268

Open

The results of the table recognition for the example PDF are incorrect. #210

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage word bbox from pdf-parser-v2 in the layout- and table-model #285

Leverage word bbox from pdf-parser-v2 in the layout- and table-model #285

PeterStaar-IBM commented Nov 9, 2024

maxmnemonic commented Nov 12, 2024

aborruso commented Nov 12, 2024

PeterStaar-IBM commented Nov 16, 2024

Leverage word bbox from pdf-parser-v2 in the layout- and table-model #285

Leverage word bbox from pdf-parser-v2 in the layout- and table-model #285

Comments

PeterStaar-IBM commented Nov 9, 2024

Requested feature

maxmnemonic commented Nov 12, 2024

aborruso commented Nov 12, 2024

PeterStaar-IBM commented Nov 16, 2024