Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Move apply-geolocation-rules to ingest subtree
This moves the script to the vendored directory, which is a git subtree of https://github.com/nextstrain/ingest (currently at branch apply-geolocation-rules). The file suffix is removed to match how it appears in the other repos which use it. As per [Overview of duplicated scripts](nextstrain/ingest#1) this script also appears in: * [monkeypox](https://github.com/nextstrain/monkeypox/blob/a1f0d7b757d323d87edcbe61c6c5ccfbdf47722c/ingest/bin/apply-geolocation-rules) * [rsv](https://github.com/nextstrain/rsv/blob/ba171f4a43110382c38b6154be3febd50408d7bf/ingest/bin/apply-geolocation-rules) * [dengue, branch new_ingest](https://github.com/nextstrain/dengue/blob/247b2fd897361f2548627de1d97d45fae4115c5c/ingest/bin/apply-geolocation-rules) All three of those scripts are identical to each other. The script vendored here contains two code changes (whitespace removed from diffs): **Ignore comment lines in the location-rules TSV** ```diff < if line.lstrip()[0] == '#': --- > if line.strip()=="" or line.lstrip()[0] == '#': ``` **Allow fields to be missing from the input NDJSON** The script previously mandated that the input NDJSON had all four fields (region/country/division/location). This is relaxed here, with an empty string used if the field is not present. ```diff < annotated_values = transform_geolocations(geolocation_rules, [record[field] for field in location_fields]) --- > annotated_values = transform_geolocations(geolocation_rules, [record.get(field, '') for field in location_fields]) ```
- Loading branch information