Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow setting tiff scale + offset + custom metadata #317

Closed
jdries opened this issue Aug 23, 2024 · 15 comments · Fixed by Open-EO/openeo-geopyspark-driver#885 or #340
Closed

allow setting tiff scale + offset + custom metadata #317

jdries opened this issue Aug 23, 2024 · 15 comments · Fixed by Open-EO/openeo-geopyspark-driver#885 or #340
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Aug 23, 2024

Add a format option to set tiff metadata. Some of these are regular tiff tags, others will have to be encoded as gdal band metadata.

The options in geotrellis are limited, but there are very basic tifftools available in linux:
https://linux.die.net/man/1/tiffset
The idea is to be able to set metadata tags without touching the rest of the file, avoiding a full rewrite. Ideally this is done right after writing the tiff in geotrellis.

Format option will have to be passed through via save_result.

@VictorVerhaert
Copy link

example of custom metadata: https://github.com/VITO-RS-Vegetation/lcfm-production/issues/18

@jdries
Copy link
Contributor Author

jdries commented Aug 23, 2024

For tiffset, we would need to add libtiff-tools to the container.

@JorisCod
Copy link

JorisCod commented Sep 5, 2024

The current metadata, visible through gdalinfo, is:

Metadata:
PROCESSING_SOFTWARE=0.39.0a1
AREA_OR_POINT=Area

both can stay, but what we are looking for is:

Metadata:
AREA_OR_POINT=Area
bands=['s2-B02-p10', 's2-B02-p25']
copyright=LCFM project 2020 / Contains modified Copernicus Sentinel data (2020) processed by LCFM consortium
creation_time=2024-07-19 00:35:04.693848
license=CC-BY 4.0 - https://creativecommons.org/licenses/by/4.0/
product_crs=EPSG:32629
product_grid=Sentinel-2 UTM tiling grid
product_tile=29TNE
product_type=LSF monthly median composite for band B02
reference=TODO
time_end=2020-02-29T23:59:59Z
time_start=2020-02-01T00:00:00Z
title=LCFM Monthly Land Surface Features (LSF-MONTHLY) product at 10m resolution for year 2020
version=v002-satio

So this would be something quite flexible. We are setting the metadata through rasterio (python) and passing a dictionary:

        if metadata:
            with rasterio.open(final_vrt_fn, 'r+') as dst:
                dst.update_tags(**metadata)

        if bands_names:
            with rasterio.open(final_vrt_fn, 'r+') as dst:
                for i, b in enumerate(bands_names):
                    dst.set_band_description(i + 1, b)

Additionally, there is a difference in how we set the band description and how OpenEO does it:
OpenEO:
Band 2 Block=64x64 Type=Int16, ColorInterp=Undefined
NoData Value=-32768
Overviews: 1033x1542, 517x771, 259x386, 130x193, 65x97, 33x49
Metadata:
DESCRIPTION=B02_P25

We:
Band 17 Block=1024x1024 Type=UInt16, ColorInterp=Undefined
Description = s2-B08-p25
NoData Value=65535
Offset: 0.0031999999191612, Scale:8.16687767161827e-06

The description in OpenEO is at metadata level, which makes that the band names are not displayed in the symbology in Qgis.

Relatedly, as you see in the output, there's an offset and scale on every band. In principle, the offset and scale can be different per band.

@JorisCod
Copy link

Just checking whether this issue is already planned?
If it's too extensive, can the scaling and offset be done first and the metadata in a separate issue?

@JorisCod
Copy link

We set the scale and offset through gdal_translate:
https://gdal.org/en/latest/programs/gdal_translate.html#cmdoption-gdal_translate-a_scale
but there are likely other options.

@bossie
Copy link
Collaborator

bossie commented Sep 24, 2024

Regardless of how we put them in the GeoTiff, will the user also provide values for scale/offset in the format options?

@bossie
Copy link
Collaborator

bossie commented Sep 24, 2024

I could get this to work:

tiffset -s 42112 '<GDALMetadata>
  <Item name="PROCESSING_SOFTWARE">0.40.1a1</Item>
  <Item name="DESCRIPTION" sample="0">red</Item>
  <Item name="SCALE" sample="0" role="scale">1.23</Item>
  <Item name="OFFSET" sample="0" role="offset">4.56</Item>
</GDALMetadata>' test_load_stac_datacube_parameters.tif

where:

  • 42112 is the code for the GDAL metadata tag;
  • sample points to the band number;
  • role is necessary to get GDAL to treat SCALE/OFFSET as actual scale/offset (not supported by Geotrellis).

bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
The spec demands that TIFF tiles have a tile size that is a multiple of 16. We write GeoTIFFs
with a tile size that is equal to the TileLayerRDD's tile size so a tile size of 4 will produce GeoTiffs
that are not compliant. Some tools cope, like gdalinfo (with a warning), but some will simply fail,
like tiffset.

Open-EO/openeo-geotrellis-extensions#317
bossie added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
@bossie bossie linked a pull request Sep 27, 2024 that will close this issue
@bossie
Copy link
Collaborator

bossie commented Sep 30, 2024

Runs locally but tests in Jenkins fail with a segmentation fault:

subprocess.CalledProcessError: Command '['tiffset', '-s', '42112', '0.40.1a1TileRowTileCol', '/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-885/pytest-tmp/pytest-of-jenkins/pytest-0/test_separate_asset_per_band_l0/openEO_2021-06-05Z_TileRow.tif']' died with <Signals.SIGSEGV: 11>.

@bossie
Copy link
Collaborator

bossie commented Oct 2, 2024

Possible method to get core dumps within the container: https://stackoverflow.com/a/72048923.

@jdries
Copy link
Contributor Author

jdries commented Oct 10, 2024

The segfault can be reproduced in kubernetes, but it does seem to depend on the exact tag and value, as I was able to set for instance the description tag.

dmesg entry for segfaults is not so helpfull:

[782319.609931] tiffset[2919160]: segfault at 6d ip 00007f1e26e96895 sp 00007fff6675e9b8 error 4 in libc-2.28.so[7f1e26d36000+1bc000]
[782319.612686] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f
[782340.241434] tiffset[2919401]: segfault at 6d ip 00007f95ff2b1895 sp 00007fff7ca95a68 error 4 in libc-2.28.so[7f95ff151000+1bc000]
[782340.242934] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f

@jdries
Copy link
Contributor Author

jdries commented Oct 10, 2024

Found a potential solution: you can also set tag values via a file, and that seems to avoid the segfault.
This worked for me:
tiffset -sf 42112 tag.txt ESA_WorldCover_10m_2021_v200_S55W071_S2RGBNIR_v2.tif

@bossie
Copy link
Collaborator

bossie commented Oct 10, 2024

Oh dear.

That does seem to work. 👍

bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 10, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 10, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 11, 2024
@bossie
Copy link
Collaborator

bossie commented Oct 14, 2024

Remaining work:

  • apply in all code paths;
  • check interaction with separate_asset_per_band;
  • address TODOs incl. adapting some tests that write GTiffs with 4x4 tiles.

bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 23, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 24, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 24, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 24, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 25, 2024
bossie added a commit that referenced this issue Oct 25, 2024
@bossie bossie linked a pull request Oct 25, 2024 that will close this issue
bossie added a commit that referenced this issue Oct 25, 2024
bossie added a commit that referenced this issue Oct 25, 2024
bossie added a commit that referenced this issue Oct 25, 2024
* support embedding GDALMetadata with tiffset #317

* support construction of GDALMetadata #317

* simplify #317

* cleanup #317
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 28, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 28, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 28, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 28, 2024
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.geotiff.package.saveRDDAllowAssetPerBand.
: java.io.IOException: tiffset -sf 42112 /tmp/GDALMetadata_14266360818144674820.xml.tmp /tmp/18350960666773006729.tif failed; output was: _TIFFVSetField: /tmp/18350960666773006729.tif: Bad value 5 for "TileWidth" tag.
	at org.openeo.geotrellis.geotiff.package$.embedGdalMetadata(package.scala:940)
	at org.openeo.geotrellis.geotiff.package$.$anonfun$writeGeoTiff$2(package.scala:839)
	at org.openeo.geotrellis.geotiff.package$.$anonfun$writeGeoTiff$2$adapted(package.scala:839)
	at scala.Option.foreach(Option.scala:407)
	at org.openeo.geotrellis.geotiff.package$.writeGeoTiff(package.scala:839)
	at org.openeo.geotrellis.geotiff.package$.writeTiff(package.scala:588)
	at org.openeo.geotrellis.geotiff.package$.saveRDDGeneric(package.scala:415)
	at org.openeo.geotrellis.geotiff.package$.saveRDDAllowAssetPerBand(package.scala:254)
	at org.openeo.geotrellis.geotiff.package.saveRDDAllowAssetPerBand(package.scala)
	at jdk.internal.reflect.GeneratedMethodAccessor439.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)

Open-EO/openeo-geotrellis-extensions#317
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 28, 2024
@bossie
Copy link
Collaborator

bossie commented Oct 28, 2024

Specify band-specific scale, offset and other metadata in the GTiff format options like so:

{
  "bands_metadata": {
    "Flat:1": {
      "SCALE": 1.23
    },
    "Flat:2": {
      "OFFSET": 4.56
    },
    "Flat:3": {
      "SCALE": 7.89,
      "OFFSET": 10.11,
      "ARBITRARY": "value"
    }
  }
}

bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 28, 2024
* use tiffset to write GDAL metadata with "role" support

Open-EO/openeo-geotrellis-extensions#317

* encapsulate tiffset

Open-EO/openeo-geotrellis-extensions#317

* fix tests that rely on test layer with tile size 4

The spec demands that TIFF tiles have a tile size that is a multiple of 16. We write GeoTIFFs
with a tile size that is equal to the TileLayerRDD's tile size so a tile size of 4 will produce GeoTiffs
that are not compliant. Some tools cope, like gdalinfo (with a warning), but some will simply fail,
like tiffset.

Open-EO/openeo-geotrellis-extensions#317

* fix some tests

Open-EO/openeo-geotrellis-extensions#317

* try workaround for tiffset segfault in container

Open-EO/openeo-geotrellis-extensions#317

* fix tests

Open-EO/openeo-geotrellis-extensions#317

* add TODOs

Open-EO/openeo-geotrellis-extensions#317

* test tiffset

Open-EO/openeo-geotrellis-extensions#317

* fix tests

Open-EO/openeo-geotrellis-extensions#317

* fix interaction with asset-per-band

Open-EO/openeo-geotrellis-extensions#317

* include details if tiffset fails

Open-EO/openeo-geotrellis-extensions#317

* simplify asset-per-band mapping

Open-EO/openeo-geotrellis-extensions#317

* tiffset is called from Scala instead + adapt/add tests

Open-EO/openeo-geotrellis-extensions#317

* cleanup

Open-EO/openeo-geotrellis-extensions#317

* fix tests

KeyError: DESCRIPTION

Open-EO/openeo-geotrellis-extensions#317

* quick fix test

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.geotiff.package.saveRDDAllowAssetPerBand.
: java.io.IOException: tiffset -sf 42112 /tmp/GDALMetadata_14266360818144674820.xml.tmp /tmp/18350960666773006729.tif failed; output was: _TIFFVSetField: /tmp/18350960666773006729.tif: Bad value 5 for "TileWidth" tag.
	at org.openeo.geotrellis.geotiff.package$.embedGdalMetadata(package.scala:940)
	at org.openeo.geotrellis.geotiff.package$.$anonfun$writeGeoTiff$2(package.scala:839)
	at org.openeo.geotrellis.geotiff.package$.$anonfun$writeGeoTiff$2$adapted(package.scala:839)
	at scala.Option.foreach(Option.scala:407)
	at org.openeo.geotrellis.geotiff.package$.writeGeoTiff(package.scala:839)
	at org.openeo.geotrellis.geotiff.package$.writeTiff(package.scala:588)
	at org.openeo.geotrellis.geotiff.package$.saveRDDGeneric(package.scala:415)
	at org.openeo.geotrellis.geotiff.package$.saveRDDAllowAssetPerBand(package.scala:254)
	at org.openeo.geotrellis.geotiff.package.saveRDDAllowAssetPerBand(package.scala)
	at jdk.internal.reflect.GeneratedMethodAccessor439.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)

Open-EO/openeo-geotrellis-extensions#317

* update version and CHANGELOG

Open-EO/openeo-geotrellis-extensions#317
@bossie
Copy link
Collaborator

bossie commented Oct 28, 2024

Available on CDSE-dev.

Example process graph:

{
  "process_graph": {
    "loadcollection1": {
      "arguments": {
        "bands": [
          "B04",
          "B03",
          "B02"
        ],
        "id": "SENTINEL2_L2A",
        "spatial_extent": {
          "east": 5.506705419643688,
          "north": 50.35658998840864,
          "south": 50.35253773281718,
          "west": 5.50057681627676
        },
        "temporal_extent": [
          "2024-04-04T00:00:00Z",
          "2024-05-04T00:00:00Z"
        ]
      },
      "process_id": "load_collection"
    },
    "saveresult1": {
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "format": "GTIFF",
		"options": {
		  "bands_metadata": {
			"B04": {"SCALE": 1.23},
			"B03": {"OFFSET": 4.56},
			"B02": {"SCALE": 7.89, "OFFSET": 10.11}
		  }
		}
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

From gdalinfo on one of the assets:

Band 1 Block=128x128 Type=Int16, ColorInterp=Red
  Description = B04
  NoData Value=-32768
  Offset: 0,   Scale:1.23
Band 2 Block=128x128 Type=Int16, ColorInterp=Green
  Description = B03
  NoData Value=-32768
  Offset: 4.56,   Scale:1
Band 3 Block=128x128 Type=Int16, ColorInterp=Blue
  Description = B02
  NoData Value=-32768
  Offset: 10.11,   Scale:7.89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants