Skip to content
This repository has been archived by the owner on Sep 7, 2020. It is now read-only.

Add way to add each document's file_hash and pages value to database #11

Open
anthonydb opened this issue Feb 18, 2016 · 2 comments
Open

Comments

@anthonydb
Copy link
Owner

Upon upload, the DocumentCloud API response does not include the values for file_hash or pages, probably because those get calculated during the processing of the document and are not available when the file is dropped off.

I'd like to add a function in db.py to walk through the database of uploaded files and retrieve those values for each doc. It should include multiprocessing on supported platforms.

@anthonydb
Copy link
Owner Author

Closed via d71a56a

@anthonydb anthonydb reopened this Mar 7, 2016
@anthonydb
Copy link
Owner Author

Reopening to remind myself that the update_processed_files method needs to write something to the database indicating a file is not found.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant