From 0d9a4a0aa97c706760cc7463318d71a9075e19d5 Mon Sep 17 00:00:00 2001 From: Alex Mandel Date: Fri, 15 Sep 2023 15:13:36 -0700 Subject: [PATCH] feat: updated to COG pages Some links, added text, minor reorg --- cloud-optimized-geotiffs/cogs-examples.ipynb | 52 +++++++++++++------- cloud-optimized-geotiffs/intro.ipynb | 25 ++++++---- 2 files changed, 49 insertions(+), 28 deletions(-) diff --git a/cloud-optimized-geotiffs/cogs-examples.ipynb b/cloud-optimized-geotiffs/cogs-examples.ipynb index 2ecc6b6..8cd8757 100644 --- a/cloud-optimized-geotiffs/cogs-examples.ipynb +++ b/cloud-optimized-geotiffs/cogs-examples.ipynb @@ -173,6 +173,14 @@ "veg_gtiff_filename = f\"{test_data_dir}/{veg_files[0]}\"" ] }, + { + "cell_type": "markdown", + "id": "13da6953", + "metadata": {}, + "source": [ + "> To learn more about the example data see the [Vegetation Continuous Fields (VCF) information page](https://lpdaac.usgs.gov/products/vcf5kyrv001/)." + ] + }, { "cell_type": "markdown", "id": "451dbe01", @@ -297,17 +305,13 @@ "id": "ff2cc531", "metadata": {}, "source": [ - "## Dimensions\n", - "\n", - "This attribute is also sometimes called **chunks** or **internal tiles**.\n", + "## Data Structure\n", "\n", - "Dimensions are the number of bands, rows and columns stored in a GeoTIFF. There is a tradeoff between storing lots of data in one GeoTIFF and storing less data in many GeoTIFFs. The larger a single file, the larger the GeoTIFF header and the multiple requests may be required just to read the spatial index before data retrieval. The opposite problem occurs if you make too many small files, then it takes many reads to retrieve data, and when rendering a combined visualization can greatly impact load time.\n", + "**Dimensions**\n", + "Dimensions are the number of bands, rows and columns stored in a GeoTIFF. [More Info](intro.ipynb#dimensions)\n", "\n", - "If you plan to pan and zoom a large amount of data through a tiling service in a web browser, there is a tradeoff between 1 large file, or many smaller files. The current recommendation is to meet somewhere in the middle, a moderate amount of medium files.\n", - "\n", - "### Internal Blocks\n", - "\n", - "Internal blocks are required if the dimensions of data are over 512x512. However you can control the size of the internal blocks. 256x256 or 512x512 are recommended. When displaying data at full resolution, or doing partial reading of data this size will impact the number of reads required. A size of 256 will take less time to read, and read less data outside the desired bounding box, however for reading large parts of a file, it may take more total read requests. Some clients will aggregate neighboring block reads to reduce the total number of requests. \n", + "**Internal Blocks** (aka chunks or internal tiles)\n", + "Internal blocks are required if the dimensions of data are over 512x512. [More Info](intro.ipynb#internal-blocks)\n", "\n", "Let's check out the dimensions and blocks of our GeoTIFF and Cloud-Optimized GeoTIFF." ] @@ -423,15 +427,15 @@ "## Overviews\n", "\n", "Overviews are downsampled (aggregated) data intended for visualization.\n", - "The best resampling algorithm depends on the range, type, and distribution of the data.\n", "\n", - "The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. The COG driver in GDAL, or rio cogeo tools should do this.\n", + "The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. \n", + "> The COG driver in GDAL, or rio cogeo tools should do this.\n", "\n", - "There are many resampling algorithms for generating overviews. When creating overviews several options should be compared before deciding which resampling method to apply.\n", + "There are many resampling algorithms for generating overviews. The best resampling algorithm depends on the range, type, and distribution of the data. When creating overviews several options should be compared before deciding which resampling method to apply. \n", "\n", "GDAL >= 3.2 allows for the overview resampling method to be set directly.\n", "\n", - "TODO: need to add hints on how to check which resampling method to use for overviews. Possibly provide code for comparing." + "" ] }, { @@ -736,13 +740,18 @@ "source": [ "# Data Type\n", "\n", - "The smallest possible data type, that still represents the data appropriately, should be used. It is not generally recommended to shift data from float to integer by multiplying, a space saving technique, as end users then need to undo this step to use the data. Data compression is preferred, see also Compression.\n", + "**Recommendation** The smallest possible data type, that still represents the data appropriately, should be used. It is not generally recommended to shift data from float to integer by multiplying, a space saving technique, as end users then need to undo this step to use the data. Data compression is preferred, see also [Compression](#compression).\n", + "\n", + "GeoTIFF format supports many data types. The key is that all bands must be of the same data type. Unlike some other formats you can not mix and match integers (whole numbers) and floats (decimal numbers) in the same file. If you have this use case consider splitting files by data type and using a catalog like STAC to keep track of them, or look at other formats like [Zarr](../zarr/intro.qmd).\n", "\n", "Scenario: If the COG is intended only for visualization, conversion to 3 band byte will improve performance. \n", "\n", + "> GDAL supported Data Types [list](https://gdal.org/drivers/raster/gtiff.html#gtiff-geotiff-file-format)\n", + "\n", + "\n", "# Compression (aka File Size)\n", "\n", - "The biggest benefit to compression is on the storage side. It’s always recommended to use a lossless compression method. Deflate or LZW are the most recommended compression algorithms, there are some choices that depend on the data type and distribution, and if the goal is maximum compression or not. Maximum compression does result in some performance loss." + "The biggest benefit to compression is on the storage side. It’s always recommended to use a lossless compression method. **Deflate** or **LZW** are the most recommended compression algorithms, there are some choices that depend on the data type and distribution, and if the goal is maximum compression or not. Maximum compression does result in some performance loss." ] }, { @@ -755,7 +764,7 @@ "\n", "# Projection\n", "\n", - "Read performance can be greatly impacted by the choice of projection and the particular applications used for dynamic tile serving. Using a known CRS defined in the PROJ database (typically EPSG code) is preferred over custom projections. Load times can be 5-20 times greater when using a custom projection. Whenever applying projections make sure to use WKT2 representation. If using a database of known projections this should be fine, there are known issues around manually setting proj-strings.\n" + "Read performance can be greatly impacted by the choice of projection and the particular applications used for dynamic tile serving. Using a known CRS defined in the PROJ database (typically EPSG code) is preferred over custom projections. Load times can be 5-20 times greater when using a custom projection. Whenever applying projections make sure to use WKT2 representation. If using a database of known projections, i.e. EPSG codes, this should be fine, there are known issues around manually setting proj-strings.\n" ] }, { @@ -768,7 +777,7 @@ "* The optimum size of data at which splitting across files improves performance as a multi-file dataset instead of a single file.\n", "* When to recommend particular internal tile sizes\n", "* Compression impacts on http transfer rates.\n", - "* Support for COG creation in other common scientific platforms (e.g. R)\n" + "* Support for COG creation in all common Geospatial tools varies.\n" ] }, { @@ -785,7 +794,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3.10.12 64-bit", "language": "python", "name": "python3" }, @@ -799,7 +808,12 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.4" + "version": "3.10.12" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } } }, "nbformat": 4, diff --git a/cloud-optimized-geotiffs/intro.ipynb b/cloud-optimized-geotiffs/intro.ipynb index c2c99d5..a35c924 100644 --- a/cloud-optimized-geotiffs/intro.ipynb +++ b/cloud-optimized-geotiffs/intro.ipynb @@ -9,7 +9,7 @@ "\n", "## What is a Cloud-Optimized GeoTIFF?\n", "\n", - "Cloud-Optimized GeoTIFF (the COG) is a variant of the TIFF image format that specifies a particular layout of internal data in the GeoTIFF specification to allow for optimized (subsetted or aggregated) access over a network for display or data reading. The key components are overviews, and internal tiling.\n", + "Cloud-Optimized GeoTIFF (the COG), a raster format, is a variant of the TIFF image format that specifies a particular layout of internal data in the GeoTIFF specification to allow for optimized (subsetted or aggregated) access over a network for display or data reading. The key components are overviews, and internal tiling.\n", "\n", "For more details see [https://www.cogeo.org/](https://www.cogeo.org/)\n", "\n", @@ -17,14 +17,14 @@ "\n", "### Dimensions\n", "\n", - "This attribute is also sometimes called **chunks** or **internal tiles**.\n", - "\n", "Dimensions are the number of bands, rows and columns stored in a GeoTIFF. There is a tradeoff between storing lots of data in one GeoTIFF and storing less data in many GeoTIFFs. The larger a single file, the larger the GeoTIFF header and the multiple requests may be required just to read the spatial index before data retrieval. The opposite problem occurs if you make too many small files, then it takes many reads to retrieve data, and when rendering a combined visualization can greatly impact load time.\n", "\n", "If you plan to pan and zoom a large amount of data through a tiling service in a web browser, there is a tradeoff between 1 large file, or many smaller files. The current recommendation is to meet somewhere in the middle, a moderate amount of medium files.\n", "\n", "### Internal Blocks\n", "\n", + "> This attribute is also sometimes called **chunks** or **internal tiles**.\n", + "\n", "Internal blocks are required if the dimensions of data are over 512x512. However you can control the size of the internal blocks. 256x256 or 512x512 are recommended. When displaying data at full resolution, or doing partial reading of data this size will impact the number of reads required. A size of 256 will take less time to read, and read less data outside the desired bounding box, however for reading large parts of a file, it may take more total read requests. Some clients will aggregate neighboring block reads to reduce the total number of requests.\n", "\n", "### Overviews\n", @@ -34,7 +34,7 @@ "\n", "The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. The COG driver in GDAL, or rio cogeo tools should do this.\n", "\n", - "There are many resampling algorithms for generating overviews. When creating overviews several options should be compared before deciding which resampling method to apply." + "There are many [resampling algorithms](https://gdal.org/programs/gdal_translate.html#cmdoption-gdal_translate-r) for generating overviews. When creating overviews several options should be compared before deciding which resampling method to apply." ] }, { @@ -60,18 +60,20 @@ "* [Development Seed Blog: Do you really want people using your data?](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f)\n", "\n", "## How to visualize COGs\n", + "\n", "\n", - "* GDAL vis* drivers (vsicurl, vsis3, vsiaz,)\n", "* Titiler https://github.com/developmentseed/titiler\n", - "* Rio-viz https://github.com/developmentseed/rio-viz" + "* Rio-viz https://github.com/developmentseed/rio-viz\n", + "* GDAL vis* drivers (vsicurl, vsis3, vsiaz,) \n", + "* Open in your favorite Desktop GIS or Remote Sensing Application" ] } ], "metadata": { "kernelspec": { - "display_name": "Python [conda env:geospatial]", + "display_name": "Python 3.10.12 64-bit", "language": "python", - "name": "conda-env-geospatial-py" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -83,7 +85,12 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.6" + "version": "3.10.12" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } } }, "nbformat": 4,