Skip to content

Commit

Permalink
refactor: templating stac-validate workflow TDE-1136 (#530)
Browse files Browse the repository at this point in the history
#### Motivation

`stac-validate` is used in many places, it should be easier to maintain
and reuse if it was a "template".

#### Modification

Create a `stac-validate` template so it can be re-use by workflows

#### Checklist

_If not applicable, provide explanation of why._

- [ ] Tests updated
- [x] Docs updated
- [x] Issue linked in Title
  • Loading branch information
paulfouquet authored Apr 18, 2024
1 parent 5f578a6 commit 43af2ea
Show file tree
Hide file tree
Showing 5 changed files with 96 additions and 64 deletions.
26 changes: 25 additions & 1 deletion templates/argo-tasks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ See https://github.com/linz/argo-tasks#stac-github-import
Template to build ODR target paths using collection metadata.
See https://github.com/linz/argo-tasks#generate-paths

## Template Usage
### Template Usage

```yaml
name: generate-path
Expand All @@ -194,3 +194,27 @@ arguments:
- name: source
value: '{{inputs.parameters.source}}'
```

## argo-tasks/stac-validate

Template to validate STAC Collections and Items against [STAC](https://stacspec.org/) schemas and STAC Extension schemas.
See (https://github.com/linz/argo-tasks#stac-validate)

### Template Usage

```yaml
- name: stac-validate
templateRef:
name: tpl-at-stac-validate
template: main
arguments:
parameters:
- name: uri
value: 's3://my-bucket/path/collection.json'
- name: checksum
value: '{{workflow.parameters.checksum}}'
- name: recursive
value: '{{workflow.parameters.recursive}}'
- name: concurrency
value: '20'
```
55 changes: 55 additions & 0 deletions templates/argo-tasks/stac-validate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/argoproj/argo-workflows/v3.5.5/api/jsonschema/schema.json

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
# Template from linz/argo-tasks
# see https://github.com/linz/argo-tasks?tab=readme-ov-file#stac-validate
name: tpl-at-stac-validate
spec:
templateDefaults:
container:
imagePullPolicy: Always
image: ''
entrypoint: main
templates:
- name: main
inputs:
parameters:
- name: uri
description: STAC file uri to validate
default: ''

- name: recursive
description: Follow and validate STAC links
default: 'true'

- name: concurrency
description: Number of requests to run concurrently
default: '50'

- name: checksum
description: Validate the file:checksum if it exists
default: 'false'

- name: version
description: container version to use
default: 'v3'

container:
image: '019359803926.dkr.ecr.ap-southeast-2.amazonaws.com/argo-tasks:{{=sprig.trim(inputs.parameters.version)}}'
resources:
requests:
cpu: 15000m
memory: 7.8Gi
command: [node, /app/index.js]
env:
- name: AWS_ROLE_CONFIG_PATH
value: s3://linz-bucket-config/config.json
args:
- 'stac'
- 'validate'
- '--concurrency={{inputs.parameters.concurrency}}'
- '--recursive={{inputs.parameters.recursive}}'
- '--checksum={{inputs.parameters.checksum}}'
- '{{inputs.parameters.uri}}'
21 changes: 6 additions & 15 deletions workflows/raster/standardising.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,7 @@ spec:
- name: target_bucket_name
value: ''
enum:
- ''
- 'nz-imagery'
- 'nz-elevation'
- ''
Expand Down Expand Up @@ -362,11 +363,13 @@ spec:
depends: 'standardise-validate'

- name: stac-validate
template: stac-validate
templateRef:
name: tpl-at-stac-validate
template: main
arguments:
parameters:
- name: location
value: '{{tasks.get-location.outputs.parameters.location}}'
- name: uri
value: '{{tasks.get-location.outputs.parameters.location}}flat/collection.json'
artifacts:
- name: stac-result
raw:
Expand Down Expand Up @@ -542,18 +545,6 @@ spec:
- '--concurrency'
- '25'

- name: stac-validate
inputs:
parameters:
- name: location
container:
image: '019359803926.dkr.ecr.ap-southeast-2.amazonaws.com/argo-tasks:{{=sprig.trim(workflow.parameters.version_argo_tasks)}}'
command: [node, /app/index.js]
env:
- name: AWS_ROLE_CONFIG_PATH
value: s3://linz-bucket-config/config.json
args: ['stac', 'validate', '--recursive', '{{inputs.parameters.location}}flat/collection.json']

- name: get-location
script:
image: '019359803926.dkr.ecr.ap-southeast-2.amazonaws.com/argo-tasks:{{=sprig.trim(workflow.parameters.version_argo_tasks)}}'
Expand Down
23 changes: 2 additions & 21 deletions workflows/stac/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,6 @@
# Contents
# stac-validate-parallel

- [stac-validate](#stac-validate)

# stac-validate

Validate STAC Collections and Items against [STAC](https://stacspec.org/) schemas and STAC Extension schemas.
Uses the [argo-tasks](https://github.com/linz/argo-tasks#stac-validate) container `stac-validate` command.

## Workflow Input Parameters

| Parameter | Type | Default | Description |
| --------- | ----- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| uri | str | s3://linz-imagery-staging/test/stac-validate/ | The full AWS S3 URI (path) to the STAC file(s) |
| include | regex | `collection.json$` | Regular expression to match object path(s) or name(s) from within the source path to include in STAC validation. |
| checksum | enum | false | Set to "true" to validate the checksums of linked asset files. |

The `--recursive` flag is specified inside the STAC Validate WorkflowTemplate. Linked STAC items linked to from a STAC collection will also be validated.

The STAC Validate Workflow will validate each collection (and linked items/assets) in a separate pod so that multiple collections can be processed in parallel.

Access permissions are controlled by the [Bucket Sharing Config](https://github.com/linz/topo-aws-infrastructure/blob/master/src/stacks/bucket.sharing.ts) which gives Argo Workflows access to the S3 buckets we use.
This Workflow will validate each collection (and linked items/assets) in a separate pod so that multiple collections can be processed in parallel, using the `tpl-at-stac-validate` template.

## Workflow Outputs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: stac-validate
name: stac-validate-parallel
labels:
linz.govt.nz/category: stac
spec:
Expand All @@ -29,18 +29,23 @@ spec:
templateDefaults:
container:
imagePullPolicy: Always
image: ''
templates:
- name: main
dag:
tasks:
- name: aws-list-collections
template: aws-list-collections
- name: stac-validate-collections
template: stac-validate-collections
templateRef:
name: tpl-at-stac-validate
template: main
arguments:
parameters:
- name: file
- name: uri
value: '{{item}}'
- name: checksum
value: '{{workflow.parameters.checksum}}'
depends: aws-list-collections
withParam: '{{tasks.aws-list-collections.outputs.parameters.files}}'
- name: aws-list-collections
Expand All @@ -67,27 +72,3 @@ spec:
- name: files
valueFrom:
path: /tmp/file_list.json
- name: stac-validate-collections
inputs:
parameters:
- name: file
container:
image: '019359803926.dkr.ecr.ap-southeast-2.amazonaws.com/argo-tasks:{{=sprig.trim(workflow.parameters.version_argo_tasks)}}'
resources:
requests:
cpu: 15000m
memory: 7.8Gi
command: [node, /app/index.js]
env:
- name: AWS_ROLE_CONFIG_PATH
value: s3://linz-bucket-config/config.json
args:
[
'stac',
'validate',
'--concurrency',
'50',
'--recursive',
'--checksum={{workflow.parameters.checksum}}',
'{{inputs.parameters.file}}',
]

0 comments on commit 43af2ea

Please sign in to comment.