-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenting edge-case-y inspection PDFs for parser testing #22
Comments
Here's another, where the species names span multiple lines: 185280e3821720b9 (uploaded) |
Another, where the species list is blank, but there's still a "Total" row: ccda727387d4c850 (uploaded) |
Here's a fun one — "Page {cp} of 1": 22c3072fd5740ef1 (uploaded) |
A zoo?
|
Indeed, lots of zoos in the data! |
Closing this issue since the core related tasks are done, but will pin it for future reference. |
Here's something that looks like a violation heading, but (a) does have an actual statute citation, and (b) appears, on cross-referencing with the web portal metadata, not actually to be a violation that APHIS is counting — 0db69ec135a5b244: |
As preparation for a more comprehensive parsing of the inspection reports, I think it'll be helpful to document some of the quirks we're seeing in the PDFs. Here's a start:
The text was updated successfully, but these errors were encountered: