-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
#### Motivation Ensuring that the STAC metadata published in ODR buckets is valid, in case of human mistake or system errors that can occur in a complex process of publication. #### Modification - A daily STAC validation job that checks STAC files only - A monthly STAC validation job that checks STAC files and assets (TIFF files) #### Checklist - [ ] Tests updated N/A - [x] Docs updated - [x] Issue linked in Title --------- Co-authored-by: Victor Engmark <[email protected]>
- Loading branch information
1 parent
21cab6a
commit b1c8853
Showing
4 changed files
with
174 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Contents: | ||
|
||
- [cron-stac-validate-fast](#cron-stac-validate-fast) | ||
- [cron-stac-validate-full](#cron-stac-validate-full) | ||
|
||
# STAC validation | ||
|
||
The goal of the following [Cron Workflows](https://argo-workflows.readthedocs.io/en/stable/cron-workflows/) is to check the validity of the STAC metadata published in the AWS Open Data Registries [NZ Elevation](https://registry.opendata.aws/nz-elevation/) and [NZ Imagery](https://registry.opendata.aws/nz-imagery/). | ||
|
||
> **_NOTE:_** To simplify the overall workflow deployment process, these `CronWorkflow`s have one main task per registry. It looks like a duplication that could be avoided but as we are not using [`argo` CLI](https://argo-workflows.readthedocs.io/en/stable/walk-through/argo-cli/) to deploy the workflows - which allows parameter passing - we could not deploy one `CronWorkflow` per `uri` (or registry). | ||
## cron-stac-validate-fast | ||
|
||
Workflow that validates the STAC metadata by calling the [`stac-validate` argo-tasks command](https://github.com/linz/argo-tasks/blob/master/README.md#stac-validate) using the [`tpl-at-stac-validate`](https://github.com/linz/topo-workflows/blob/master/templates/argo-tasks/README.md#argo-tasksstac-validate). | ||
|
||
It does verify that the [STAC links](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#link-object) are valid. | ||
|
||
- schedule: **every day at 5am** | ||
|
||
## cron-stac-validate-full | ||
|
||
Workflow that validates the STAC metadata by calling the [`stac-validate` argo-tasks command](https://github.com/linz/argo-tasks/blob/master/README.md#stac-validate) using the [`stac-validate-parallel`](https://github.com/linz/topo-workflows/blob/master/workflows/stac/README.md#stac-validate-parallel). | ||
|
||
It also validate that the [STAC assets](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#assets) are valid. Verifying all asset (TIFF files) checksums is expensive, so this workflow is run less often than [cron-stac-validate-fast](#cron-stac-validate-fast). | ||
|
||
> **_NOTE:_** Due to the parallelism design, this workflow does not validate the root parent `catalog.json` in order to validate each `collection.json` separately. This is not an issue as the `catalog.json` does not contain any `asset` and is already validated by the [cron-stac-validata-fast](#cron-stac-validate-fast) job. | ||
- schedule: **every 1st of the month** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# yaml-language-server: $schema=https://raw.githubusercontent.com/argoproj/argo-workflows/v3.5.5/api/jsonschema/schema.json | ||
apiVersion: argoproj.io/v1alpha1 | ||
kind: CronWorkflow | ||
metadata: | ||
name: cron-stac-validate-fast | ||
labels: | ||
linz.govt.nz/category: stac | ||
spec: | ||
schedule: '0 05 * * *' # 5 AM every day | ||
timezone: 'NZ' | ||
startingDeadlineSeconds: 3600 # Allow 1 hour delay if the workflow-controller clashes during the starting time. | ||
concurrencyPolicy: 'Allow' | ||
successfulJobsHistoryLimit: 3 | ||
failedJobsHistoryLimit: 3 | ||
suspend: false | ||
workflowSpec: | ||
entrypoint: main | ||
arguments: | ||
parameters: | ||
- name: checksum_assets | ||
value: 'false' | ||
- name: 'checksum_links' | ||
value: 'true' | ||
templates: | ||
- name: main | ||
retryStrategy: | ||
limit: '0' | ||
steps: | ||
- - name: stac-validate-imagery | ||
templateRef: | ||
name: tpl-at-stac-validate | ||
template: main | ||
arguments: | ||
parameters: | ||
- name: 'uri' | ||
value: 's3://nz-imagery/catalog.json' | ||
- name: 'checksum_assets' | ||
value: '{{workflow.parameters.checksum_assets}}' | ||
- name: 'checksum_links' | ||
value: '{{workflow.parameters.checksum_assets}}' | ||
- name: stac-validate-elevation | ||
templateRef: | ||
name: tpl-at-stac-validate | ||
template: main | ||
arguments: | ||
parameters: | ||
- name: 'uri' | ||
value: 's3://nz-elevation/catalog.json' | ||
- name: 'checksum_assets' | ||
value: '{{workflow.parameters.checksum_assets}}' | ||
- name: 'checksum_links' | ||
value: '{{workflow.parameters.checksum_assets}}' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# yaml-language-server: $schema=https://raw.githubusercontent.com/argoproj/argo-workflows/v3.5.5/api/jsonschema/schema.json | ||
apiVersion: argoproj.io/v1alpha1 | ||
kind: CronWorkflow | ||
metadata: | ||
name: cron-stac-validate-full | ||
labels: | ||
linz.govt.nz/category: stac | ||
spec: | ||
schedule: '0 05 1 * *' # 5 AM every 1st of the month | ||
timezone: 'NZ' | ||
startingDeadlineSeconds: 3600 # Allow 1 hour delay if the workflow-controller clashes during the starting time. | ||
concurrencyPolicy: 'Allow' | ||
successfulJobsHistoryLimit: 3 | ||
failedJobsHistoryLimit: 3 | ||
suspend: false | ||
workflowSpec: | ||
entrypoint: main | ||
arguments: | ||
parameters: | ||
- name: version_argo_tasks | ||
value: 'v4' | ||
- name: 'include' | ||
value: 'collection.json$' | ||
- name: checksum_assets | ||
value: 'true' | ||
- name: 'checksum_links' | ||
value: 'true' | ||
templates: | ||
- name: main | ||
retryStrategy: | ||
limit: '0' | ||
steps: | ||
- - name: stac-validate-imagery | ||
templateRef: | ||
name: stac-validate-parallel | ||
template: main | ||
arguments: | ||
parameters: | ||
- name: version_argo_tasks | ||
value: '{{workflow.parameters.version_argo_tasks}}' | ||
- name: 'uri' | ||
value: 's3://nz-imagery/' | ||
- name: include | ||
value: '{{workflow.parameters.include}}' | ||
- name: 'checksum_assets' | ||
value: '{{workflow.parameters.checksum_assets}}' | ||
- name: 'checksum_links' | ||
value: '{{workflow.parameters.checksum_assets}}' | ||
- name: stac-validate-elevation | ||
templateRef: | ||
name: stac-validate-parallel | ||
template: main | ||
arguments: | ||
parameters: | ||
- name: version_argo_tasks | ||
value: '{{workflow.parameters.version_argo_tasks}}' | ||
- name: 'uri' | ||
value: 's3://nz-elevation/' | ||
- name: include | ||
value: '{{workflow.parameters.include}}' | ||
- name: 'checksum_assets' | ||
value: '{{workflow.parameters.checksum_assets}}' | ||
- name: 'checksum_links' | ||
value: '{{workflow.parameters.checksum_assets}}' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters