Scripts for bulk ingest of maps into Macrostrat
The map ingestion code written for TA4 tasks at the 6-month hackathon has been re-packaged into the following commands of the Macrostrat CLI:
macrostrat maps pipeline upload-file
macrostrat maps pipeline ingest-map
macrostrat maps pipeline ingest-csv
See ../map-integration for the implementation of these commands.
-
Install Poetry.
-
Set the Python 3.11+ installation:
poetry env use /usr/bin/python3.11
-
Install dependencies:
poetry install --sync
-
Copy
macrostrat.toml.template
tomacrostrat.toml
, copy theexample
section, and set each key to an appropriate value.
The import process can be divided into two phases:
-
Scraping some data source for potential maps of interest. This is a task that cannot be generalized across multiple data sources.
-
Using the data obtained in the previous step to populate data into Macrostrat's database and object store. This task can be generalized to work across multiple data sources.
The scripts in the macrostrat.map_staging
package address the first of these two steps. Each script outputs a CSV file
that can be fed into macrostrat maps pipeline ingest-csv
, which addresses
the second of these two steps.
Each example below describes how to scrape a data source and produce a CSV
file for the macrostrat maps pipeline ingest-csv
command.
The input CSV file here was provided by the CriticalMAAS program.
poetry run python3 macrostrat/map_staging/criticalmaas_09.py data/criticalmaas_09_all.csv
The resulting output is in data/criticalmaas_09.csv.
When running macrostrat maps pipeline ingest-csv
, the --filter ta1
option can be used to attempt to exclude bounding boxes and map legends.
A complete command invocation might look as follows:
poetry run macrostrat maps pipeline ingest-csv data/criticalmaas_09.csv \
--filter ta1 \
--tag "9 Month Hackathon" --tag "TA1 Output" \
--download-dir ./tmp \
--s3-bucket map-ingest --s3-prefix criticalmaas/month-09 \
| tee -a criticalmaas_09.log
The input CSV file here was provided by the USGS and flags NGMDB products of interest to the CriticalMAAS program.
poetry run python3 macrostrat/map_staging/ngmdb.py data/ngmdb_usgs_records_all.csv
The resulting output is in data/ngmdb.csv.
poetry run python3 macrostrat/map_staging/arizona.py
The resulting output is in data/arizona.csv.
poetry run python3 macrostrat/map_staging/alaska.py
The resulting output is in data/alaska_all.csv. Several of these maps pose problems for Macrostrat's ingestion pipeline. Deleting the corresponding rows yields data/alaska.csv.
When running macrostrat maps pipeline ingest-csv
, the --filter alaska
option can be used to attempt to parse additional metadata from the files
contained in each archive.
poetry run python3 macrostrat/map_staging/nevada.py
The resulting output is in data/nevada.csv.