semi-automatic parser #140

Gigi-G · 2023-11-04T11:10:42Z

JSON Parser

For now, we can use the following steps to generate the JSON files:

Use https://croppdf.com/ to remove all unnecessary white spaces from the PDF document.
Utilize https://products.aspose.app/pdf/table-extraction to create an Excel file directly from the PDF. This is because converting it directly to CSV may result in a poor-quality output. Creating an XLS file first and then converting it will yield a better result.
Review and edit the document to eliminate unnecessary white spaces or inconsistencies.

The goal is to create an automatic pipeline that performs these steps.

Helias · 2023-11-11T21:56:52Z

can be closed?

Gigi-G added the enhancement New feature or request label Nov 4, 2023

Gigi-G mentioned this issue Nov 4, 2023

ENH semi-automatic parser #141

Merged