Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery endpoint should submit STAC items for all discovered S3 objects #192

Closed
1 task done
anayeaye opened this issue Jul 20, 2024 · 3 comments
Closed
1 task done
Assignees
Labels
bug Something isn't working

Comments

@anayeaye
Copy link
Contributor

anayeaye commented Jul 20, 2024

What

The discovery/ endpoint discovers more objects than are published to STAC. Generally only 9 or 10 items make it to the stac catalog which seems like maybe a batch is dropped when the discovery dag transitions from raster_vector_branching to parralel_run_process_rasters. No jobs fail in airflow.

Note
When the same regex is supplied via dataset/publish all 19 items are created.

How to reproduce

  1. POST a collection via the ingest-api/collections endpoint
collection.json
{
  "id": "omi-19-item-collection-deleteme",
  "type": "Collection",
  "links": [],
  "title": "DELETE ME 19 item collection OMI_trno2",
  "extent": {
    "spatial": {
      "bbox": [
        [-180, -90, 180, 90]
      ]
    },
    "temporal": {
      "interval": [
        [null, null]
      ]
    }
  },
  "license": "MIT",
  "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)",
  "item_assets": {
    "cog_default": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": [
        "data",
        "layer"
      ],
      "title": "Default COG Layer",
      "description": "Cloud optimized default layer to display on map"
    }
  },
  "stac_version": "1.0.0",
	"renders": {
        "dashboard": {
            "colormap_name": "reds",
            "rescale": [
                [
                    0,
                    3000000000000000.0
                ]
            ],
            "assets": [
                "cog_default"
            ],
            "title": "VEDA Dashboard Render Parameters"
        }
    },
    "providers": [
        {
            "name": "NASA VEDA",
            "url": "https://www.earthdata.nasa.gov/dashboard/",
            "roles": [
                "host"
            ]
        }
    ],
    "item_assets": {
        "test_asset": {
            "title": "An item asset description for test",
						"type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": ["test"]
        },
				"cog_default": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Default COG Layer",
            "description": "Cloud optimized default layer to display on map"
        }
    },
    "assets": {
        "thumbnail": {
            "title": "Thumbnail",
            "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)",
            "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg",
            "type": "image/jpeg",
            "roles": ["thumbnail"]
        }
    }
}
  1. Trigger a discovery via the workflows api/discovery endpoint
discovery-config.json
{
    "collection": "omi-19-item-collection-deleteme",
    "bucket": "veda-data-store-staging",
    "datetime_range": "year",
    "discovery": "s3",
    "filename_regex": "^(.*).tif$",
    "prefix": "OMI_trno2-COG/"
}
  1. Check number of items published vs. the number of objects discovered in the discovery DAG log. The example above should create 19 items.

AC

  • a STAC item is published for each s3 object detected in summary job
@anayeaye anayeaye added the bug Something isn't working label Jul 20, 2024
@ividito ividito self-assigned this Aug 1, 2024
@anayeaye
Copy link
Contributor Author

Since opening this issue we have a new bug that requires adding "id_template": "{}" to the discovery config as a temporary work around to #194

@anayeaye
Copy link
Contributor Author

Recent changes in dev may have already resolved this issue. I don't know what change to trace this but:

@smohiudd
Copy link
Contributor

The concurrency PR has been merged: #197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants