-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semi-automatic parser #140
Labels
enhancement
New feature or request
Comments
can be closed? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
JSON Parser
For now, we can use the following steps to generate the JSON files:
Use https://croppdf.com/ to remove all unnecessary white spaces from the PDF document.
Utilize https://products.aspose.app/pdf/table-extraction to create an Excel file directly from the PDF. This is because converting it directly to CSV may result in a poor-quality output. Creating an XLS file first and then converting it will yield a better result.
Review and edit the document to eliminate unnecessary white spaces or inconsistencies.
The goal is to create an automatic pipeline that performs these steps.
The text was updated successfully, but these errors were encountered: