You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use poppler's pdftohtml -xml to convert PDFs into XML documents
Depending on the horizonal and vertical spacing between arbitrary-length text
objects, which are arbitrarily strewn on the page, figure out whether they
are: a continuation of the same word or paragraph; or otherwise part of a
table
Intelligently collapse contiguous blank text objects on page breaks
Construct syntax tree to assign parts of transcript to classes?
Parse into Akoma Ntoso
The text was updated successfully, but these errors were encountered:
pdftohtml -xml
to convert PDFs into XML documentsobjects, which are arbitrarily strewn on the page, figure out whether they
are: a continuation of the same word or paragraph; or otherwise part of a
table
The text was updated successfully, but these errors were encountered: