Datacube Alchemist is a command line application for performing Dataset to Dataset transformations in the context of an Open Data Cube system.
It uses a configuration file which specifies an input Product or Products, a Transformation to perform, and output parameters and destination.
Features
- Writes output to Cloud Optimised GeoTIFFs
- Easily runs within a Docker Container
- Parallelism using AWS SQS queues and Kubernetes
- Write output data to S3 or a file system
- Generates
eo3
format dataset metadata, along with processing information - Generates STAC 1.0.0.beta2 dataset metadata
- Configurable thumbnail generation
- Pass any command line options as Environment Variables
You can build the docker image locally with Docker or Docker Compose. The commands are
docker build --tag opendatacube/datacube-alchemist .
or docker-compose build
.
There's a Python setup file, so you can do pip3 install .
in the root folder. You will
need to ensure that the Open Data Cube and all its dependencies happily install though.
To run some example processes you can use the Docker Compose file to create a local workspace. To start the workspace and run an example, you can do the following:
- Export the environment variables
ODC_ACCESS_KEY
andODC_SECRET_KEY
with valid AWS credentials - Run
make up
ordocker-compose up
to start the postgres and datacube-alchemist Docker containers make initdb
to initialise the ODC database (or see the Makefile for the specific command)make metadata
will add the metadata that the Landsat example product needsmake product
will add the Landsat product definitionsmake index
will index a range of Landsat scenes to test processing withmake wofs-one
ormake fc-one
will process a single Fractional Cover or Water Observations from Space scene and output the results to the ./examples folder in this project directory
Note that the --config-file
can be a local path or a URI.
Note that --dryrun
is optional, and will run a 1/10 scale load and will not
write output to the final destination.
datacube-alchemist run-one \
--config-file ./examples/c3_config_wo.yaml \
--uuid 7b9553d4-3367-43fe-8e6f-b45999c5ada6 \
--dryrun \
Note that the final argument is a datacube expression , see Datacube Search documentation.
datacube-alchemist run-many \
--config-file ./examples/c3_config_wo.yaml \
--limit=2 \
--dryrun \
time in 2020-01
Notes on queues. To run jobs from an SQS queue, good practice is to create a deadletter queue as well as a main queue. Jobs (messages) get picked up off the main queue, and if they're successful, then they're deleted. If they aren't successful, they're not deleted, and they go back on the main queue after a defined amount of time. If this happens more than the defined number of times then the message is moved to the deadletter queue. In this way, you can track work completion.
datacube-alchemist run-from-queue \
--config-file ./examples/c3_config_wo.yaml \
--queue example-queue-name \
--limit=1 \
--queue-timeout=600 \
--dryrun
The --limit
is the total number of datasets to limit to, whereas the --product-limit
is
the number of datasets per product, in the case that you have multiple input products.
datacube-alchemist add-to-queue \
--config-file ./examples/c3_config_wo.yaml \
--queue example-queue-name \
--limit=300 \
--product-limit=100
This will get items from a deadletter queue and push them to an alive queue. Be careful, because it doesn't know what queue is what. You need to know that!
datacube-alchemist redrive-to-queue \
--queue example-from-queue \
--to-queue example-to-queue
Apache License 2.0
© 2021, Open Data Cube Community