Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return extracted_str if no templates found with extract_data() (python) ? #392

Open
Whaoo opened this issue Sep 8, 2022 · 4 comments
Open

Comments

@Whaoo
Copy link

Whaoo commented Sep 8, 2022

Hi,

Is there a way to return the extracted_str (full pdf text in str) if no templates are found for the pdf ?

Saw it in the main.py that in debug extracted_str is exactly what i want to collect, that would save me time rather than calling and storing again pdf2text.

Is there any way to return it in extract_data() if no templates are found for the .pdf ?

Many thanks

@bosd
Copy link
Collaborator

bosd commented Feb 13, 2023

Is this what you are looking for? Or get some inspiration from?
Did'nt test this.

https://github.com/OCA/edi/pull/399/files#diff-652ac3ae132c668bf2ac61903174bbc0c254c98bf549aac7cad47a515259ed32R70-R128

@rmilecki
Copy link
Collaborator

Maybe we could make invoice2data more object oriented?

# Use static method
templates = Invoice2Data.read_templates("templates/")

i2d = Invoice2Data()
try:
    i2d.extract_data("foo.pdf", templates=templates)
except Exception as e:
    print('Failed to extract data: ' + str(e))
    print('Extracted text: ' + i2d.get_extracted_text())

@legalsylvain
Copy link

Hi @rmilecki
I'm looking for a way to have the detail of the parsing error. (no templates found / missing required feld / ...). For the time being, the information is in the log, but not accessible if using invoice2data as a library.

what I don't understand in your code, is that AFAIK, extract_data doesn't raise an error. Or did I missed something ?

try:
    i2d.extract_data("foo.pdf", templates=templates)
except Exception as e:
    print('Failed to extract data: ' + str(e))
    print('Extracted text: ' + i2d.get_extracted_text())

@Whaoo
Copy link
Author

Whaoo commented Mar 1, 2023

Hi guys, nice to see my question is interesting other people
Will try using what you pushed @rmilecki :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants