Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semi-automatic parser #140

Open
Gigi-G opened this issue Nov 4, 2023 · 1 comment
Open

semi-automatic parser #140

Gigi-G opened this issue Nov 4, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@Gigi-G
Copy link
Member

Gigi-G commented Nov 4, 2023

JSON Parser

For now, we can use the following steps to generate the JSON files:

  1. Use https://croppdf.com/ to remove all unnecessary white spaces from the PDF document.

  2. Utilize https://products.aspose.app/pdf/table-extraction to create an Excel file directly from the PDF. This is because converting it directly to CSV may result in a poor-quality output. Creating an XLS file first and then converting it will yield a better result.

  3. Review and edit the document to eliminate unnecessary white spaces or inconsistencies.

The goal is to create an automatic pipeline that performs these steps.

@Gigi-G Gigi-G added the enhancement New feature or request label Nov 4, 2023
@Helias
Copy link
Member

Helias commented Nov 11, 2023

can be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants