From 0be7067f2d5c8673d95a6c24da61516cc8bef976 Mon Sep 17 00:00:00 2001 From: Alex Mandel Date: Fri, 15 Sep 2023 16:57:51 -0700 Subject: [PATCH] fix: split GeoTiff details to new page --- cloud-optimized-geotiffs/cogs-details.ipynb | 98 ++++++++++++++++++++ cloud-optimized-geotiffs/cogs-examples.ipynb | 58 ------------ 2 files changed, 98 insertions(+), 58 deletions(-) create mode 100644 cloud-optimized-geotiffs/cogs-details.ipynb diff --git a/cloud-optimized-geotiffs/cogs-details.ipynb b/cloud-optimized-geotiffs/cogs-details.ipynb new file mode 100644 index 0000000..b2487ac --- /dev/null +++ b/cloud-optimized-geotiffs/cogs-details.ipynb @@ -0,0 +1,98 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e84fbc0b", + "metadata": {}, + "source": [ + "# COG Format Details\n", + "\n", + "In the [COG Intro](intro.ipynb) you can see what makes a COG different from non-optimized GeoTIFFs. The rest of this page details additional GeoTIFF information that can be relevant to making your files as useful as possible but not a COG requirement." + ] + }, + { + "cell_type": "markdown", + "id": "0d780549-1ffb-45cd-a7ac-969072d46137", + "metadata": {}, + "source": [ + "# Data Type\n", + "\n", + "**Recommendation** The smallest possible data type, that still represents the data appropriately, should be used. It is not generally recommended to shift data from float to integer by multiplying, a space saving technique, as end users then need to undo this step to use the data. Data compression is preferred, see also [Compression](#compression).\n", + "\n", + "GeoTIFF format supports many data types. The key is that all bands must be of the same data type. Unlike some other formats you can not mix and match integers (whole numbers) and floats (decimal numbers) in the same file. If you have this use case consider splitting files by data type and using a catalog like STAC to keep track of them, or look at other formats like [Zarr](../zarr/intro.qmd).\n", + "\n", + "Scenario: If the COG is intended only for visualization, conversion to 3 band byte will improve performance. \n", + "\n", + "> GDAL supported Data Types [list](https://gdal.org/drivers/raster/gtiff.html#gtiff-geotiff-file-format)\n", + "\n", + "\n", + "# Compression (aka File Size)\n", + "\n", + "The biggest benefit to compression is on the storage side. It’s always recommended to use a lossless compression method. **Deflate** or **LZW** are the most recommended compression algorithms, there are some choices that depend on the data type and distribution, and if the goal is maximum compression or not. Maximum compression does result in some performance loss." + ] + }, + { + "cell_type": "markdown", + "id": "9fbe71f0-20b9-4a1b-8e7f-de52a90cc7c6", + "metadata": {}, + "source": [ + "# No Data\n", + "Setting a no data value makes it clear to users and visualization tools what pixels are not actually data. For visualization this allows these pixels to be easily hidden (transparent). Historically many values have been used, 0, -9999, etc… The key is to make sure the GDAL flag for no data is set. It is also suggested that the smallest negative value be used instead of a random value. For byte and unsigned integers/floats this will be 0, if 0 has meaning in your data use a different value (like the max possible value). Having the right nodata flag set is important for overview generation.\n", + "\n", + "# Projection\n", + "\n", + "Read performance can be greatly impacted by the choice of projection and the particular applications used for dynamic tile serving. Using a known CRS defined in the PROJ database (typically EPSG code) is preferred over custom projections. Load times can be 5-20 times greater when using a custom projection. Whenever applying projections make sure to use WKT2 representation. If using a database of known projections, i.e. EPSG codes, this should be fine, there are known issues around manually setting proj-strings.\n" + ] + }, + { + "cell_type": "markdown", + "id": "1e9d49cc-7e1a-4e7b-94a3-539b8e51f0c7", + "metadata": {}, + "source": [ + "## What we don’t know (areas of research)\n", + "\n", + "* The optimum size of data at which splitting across files improves performance as a multi-file dataset instead of a single file.\n", + "* When to recommend particular internal tile sizes\n", + "* Compression impacts on http transfer rates.\n", + "* Support for COG creation in all common Geospatial tools varies.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d193ab02-bb69-455e-9b72-5b89728f086e", + "metadata": {}, + "source": [ + "## Additional Resources\n", + "\n", + "* [An Introduction to Cloud Optimized GeoTIFFS (COGs) Part 1: Overview](https://developers.planet.com/docs/planetschool/an-introduction-to-cloud-optimized-geotiffs-cogs-part-1-overview/)\n", + "* [Do you really want people using your data?](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.10.12 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/cloud-optimized-geotiffs/cogs-examples.ipynb b/cloud-optimized-geotiffs/cogs-examples.ipynb index 8cd8757..3f71a91 100644 --- a/cloud-optimized-geotiffs/cogs-examples.ipynb +++ b/cloud-optimized-geotiffs/cogs-examples.ipynb @@ -732,64 +732,6 @@ "source": [ "show_overviews(tmp_cog)" ] - }, - { - "cell_type": "markdown", - "id": "0d780549-1ffb-45cd-a7ac-969072d46137", - "metadata": {}, - "source": [ - "# Data Type\n", - "\n", - "**Recommendation** The smallest possible data type, that still represents the data appropriately, should be used. It is not generally recommended to shift data from float to integer by multiplying, a space saving technique, as end users then need to undo this step to use the data. Data compression is preferred, see also [Compression](#compression).\n", - "\n", - "GeoTIFF format supports many data types. The key is that all bands must be of the same data type. Unlike some other formats you can not mix and match integers (whole numbers) and floats (decimal numbers) in the same file. If you have this use case consider splitting files by data type and using a catalog like STAC to keep track of them, or look at other formats like [Zarr](../zarr/intro.qmd).\n", - "\n", - "Scenario: If the COG is intended only for visualization, conversion to 3 band byte will improve performance. \n", - "\n", - "> GDAL supported Data Types [list](https://gdal.org/drivers/raster/gtiff.html#gtiff-geotiff-file-format)\n", - "\n", - "\n", - "# Compression (aka File Size)\n", - "\n", - "The biggest benefit to compression is on the storage side. It’s always recommended to use a lossless compression method. **Deflate** or **LZW** are the most recommended compression algorithms, there are some choices that depend on the data type and distribution, and if the goal is maximum compression or not. Maximum compression does result in some performance loss." - ] - }, - { - "cell_type": "markdown", - "id": "9fbe71f0-20b9-4a1b-8e7f-de52a90cc7c6", - "metadata": {}, - "source": [ - "# No Data\n", - "Setting a no data value makes it clear to users and visualization tools what pixels are not actually data. For visualization this allows these pixels to be easily hidden (transparent). Historically many values have been used, 0, -9999, etc… The key is to make sure the GDAL flag for no data is set. It is also suggested that the smallest negative value be used instead of a random value. For byte and unsigned integers/floats this will be 0, if 0 has meaning in your data use a different value (like the max possible value). Having the right nodata flag set is important for overview generation.\n", - "\n", - "# Projection\n", - "\n", - "Read performance can be greatly impacted by the choice of projection and the particular applications used for dynamic tile serving. Using a known CRS defined in the PROJ database (typically EPSG code) is preferred over custom projections. Load times can be 5-20 times greater when using a custom projection. Whenever applying projections make sure to use WKT2 representation. If using a database of known projections, i.e. EPSG codes, this should be fine, there are known issues around manually setting proj-strings.\n" - ] - }, - { - "cell_type": "markdown", - "id": "1e9d49cc-7e1a-4e7b-94a3-539b8e51f0c7", - "metadata": {}, - "source": [ - "## What we don’t know (areas of research)\n", - "\n", - "* The optimum size of data at which splitting across files improves performance as a multi-file dataset instead of a single file.\n", - "* When to recommend particular internal tile sizes\n", - "* Compression impacts on http transfer rates.\n", - "* Support for COG creation in all common Geospatial tools varies.\n" - ] - }, - { - "cell_type": "markdown", - "id": "d193ab02-bb69-455e-9b72-5b89728f086e", - "metadata": {}, - "source": [ - "## Additional Resources\n", - "\n", - "* [An Introduction to Cloud Optimized GeoTIFFS (COGs) Part 1: Overview](https://developers.planet.com/docs/planetschool/an-introduction-to-cloud-optimized-geotiffs-cogs-part-1-overview/)\n", - "* [Do you really want people using your data?](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f)" - ] } ], "metadata": {