This guide provides information on how VEDA runs data ingest, transformation and metadata (STAC) publication workflows via AWS Services, such as step functions.
NOTE: Since collection ingest still requires calling the database from a local machine, users must add their IP to an inbound rule on the security group attached to the RDS instance.
The collections/
directory holds json files representing the data for VEDA collection metadata (STAC).
Should follow the following format:
{
"id": "<collection-id>",
"type": "Collection",
"links":[
],
"title":"<collection-title>",
"extent":{
"spatial":{
"bbox":[
[
"<min-longitude>",
"<min-latitude>",
"<max-longitude>",
"<max-latitude>",
]
]
},
"temporal":{
"interval":[
[
"<start-date>",
"<end-date>",
]
]
}
},
"license":"MIT",
"description": "<collection-description>",
"stac_version": "1.0.0",
"dashboard:is_periodic": "<true/false>",
"dashboard:time_density": "<month/>day/year>",
"item_assets": {
"cog_default": {
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"roles": [
"data",
"layer"
],
"title": "Default COG Layer",
"description": "Cloud optimized default layer to display on map"
}
}
}
The step_function_inputs/
directory holds json files representing the step function inputs for initiating the discovery, ingest and publication workflows.
Can either be a single input event or a list of input events.
Should follow the following format:
{
"collection": "<collection-id>",
"discovery": "<s3/cmr>",
## for s3 discovery
"prefix": "<s3-key-prefix>",
"bucket": "<s3-bucket>",
"filename_regex": "<filename-regex>",
"datetime_range": "<month/day/year>",
## for cmr discovery
"version": "<collection-version>",
"temporal": ["<start-date>", "<end-date>"],
"bounding_box": ["<bounding-box-as-comma-separated-LBRT>"],
"include": "<filename-pattern>",
### misc
"cogify": "<true/false>",
"upload": "<true/false>",
"dry_run": "<true/false>",
}
Install dependencies:
poetry install
Done by passing the collection json to the submit-stac
lambda.
Create a collection json file in the data/collections/
directory. For format, check the data section.
poetry run insert-collection <collection-name-start-pattern>
Done by passing an input json to the discovery step function workflow.
Create an input json file in the data/step_function_inputs/
directory. For format, check the data section.
poetry run insert-item <event-json-start-pattern>
Login in...
Discovers all the files in an S3 bucket, based on the prefix and filename regex.
Discovers all the files in a CMR collection, based on the version, temporal, bounding box, and include. Returns objects that follow the specified criteria.
Converts the input file to a COG file, writes it to S3, and returns the S3 key.
Copies the data to the VEDA MCP bucket if necessary.
Given an object received from the STAC_READY_QUEUE
, builds a STAC Item, writes it to S3, and returns the S3 key.
Submits STAC items to STAC Ingestor system via POST requests.
Reads objects from the specified queue in batches and invokes the specified step function workflow with the objects from the queue as the input.
Discovers all the files that need to be ingested either from s3 or cmr.
Converts the input files to COGs, runs in parallel.
Publishes the item to the STAC database (and MCP bucket if necessary).