You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The metadata returned by the APHIS portal recently added some new fields: zip, state, city, and certType. Now result entries look like this:
{
"certNumber": "83-R-0001",
"certType": "Class R - Research Facility",
"city": "LARAMIE",
"critical": 0,
"customerNumber": "16",
"direct": 0,
"inspectionDate": "2023-04-17",
"inspectionDateString": "4/17/2023",
"legalName": "University of Wyoming",
"nonCritical": 0,
"reportLink": "https://aphis--c.na107.content.force.com/[...]",
"siteName": "UNIVERSITY OF WYOMING",
"state": "Wyoming",
"teachableMoments": 0,
"zip": "82071"
}
This causes csv.DictWriter to throw an error when writing inspections.csv, because the field names for that CSV are based on our cached results, which did not have those fields. Commit 92179d9 prevents the error the simplest way, by adding the extrasaction="ignore" parameter to the csv.DictWriter instantiation.
And although we can get the same data via the records we already have (and in fact are already pulling out certificate type and state), we might still want to add these four columns.
Benefits of doing this:
A more complete reflection of the data available through the web portal.
The data could provide a useful cross-check on the same information we're extracting from the PDFs.
Costs / limitations:
It'll take some work to get these new columns backfilled for the historical data while not losing key info such as the pipeline discovery date for each inspection.
Adding these fields will make the file sizes larger, bringing us to GitHub's individual-file size limits faster. (data/combined/inspections.csv is currently ~48MB, halfway toward the 100MB limit.)
The text was updated successfully, but these errors were encountered:
The metadata returned by the APHIS portal recently added some new fields:
zip
,state
,city
, andcertType
. Now result entries look like this:This causes
csv.DictWriter
to throw an error when writinginspections.csv
, because the field names for that CSV are based on our cached results, which did not have those fields. Commit 92179d9 prevents the error the simplest way, by adding theextrasaction="ignore"
parameter to thecsv.DictWriter
instantiation.And although we can get the same data via the records we already have (and in fact are already pulling out certificate type and state), we might still want to add these four columns.
Benefits of doing this:
Costs / limitations:
data/combined/inspections.csv
is currently ~48MB, halfway toward the 100MB limit.)The text was updated successfully, but these errors were encountered: