Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash on encountering an encrypted PDF file #2182

Merged
merged 1 commit into from
Jan 19, 2024
Merged

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Jan 19, 2024

Some people have submitted PDF training reports to PhysioNet that are encrypted with a password. I still don't know why, but at least three people did so.

We can't read them, so presumably these submissions ought to be rejected. However, we currently aren't preventing these submissions, and trying to access the console page to view/reject the submission causes an error (PDFPasswordIncorrect, which is not a subclass of PDFSyntaxError).

This pull should treat encrypted PDF files the same as any other file that the server is unable to parse.

For more long-term solutions see issue #2179.

This function is a best effort attempt to get text from a PDF file -
if the file is malformed (PDFSyntaxError), this function returns an
empty string.  However, there are other exceptions such as
PDFPasswordIncorrect that can occur even if the file is well-formed.

Although it would be better to handle these exceptions at a higher
level, this is a temporary fix to allow training applications
containing encrypted files to be rejected.
@tompollard tompollard merged commit 9af7eed into dev Jan 19, 2024
11 checks passed
@tompollard tompollard deleted the bm/pdf-exception branch January 19, 2024 20:17
@tompollard
Copy link
Member

(i have pushed to the live server)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants