Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Markdown link syntax #56

Merged
merged 10 commits into from
Sep 18, 2023
2 changes: 1 addition & 1 deletion .github/workflows/preview.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ jobs:
- name: Deploy preview
uses: rossjrw/pr-preview-action@v1
with:
source-dir: docs
source-dir: _site
98 changes: 98 additions & 0 deletions cloud-optimized-geotiffs/cogs-details.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to be a markdown file? It doesn't look like it has any executable code blocks

"cells": [
{
"cell_type": "markdown",
"id": "e84fbc0b",
"metadata": {},
"source": [
"# COG Format Details\n",
"\n",
"In the [COG Intro](intro.ipynb) you can see what makes a COG different from non-optimized GeoTIFFs. The rest of this page details additional GeoTIFF information that can be relevant to making your files as useful as possible but not a COG requirement."
]
},
{
"cell_type": "markdown",
"id": "0d780549-1ffb-45cd-a7ac-969072d46137",
"metadata": {},
"source": [
"# Data Type\n",
"\n",
"**Recommendation** The smallest possible data type, that still represents the data appropriately, should be used. It is not generally recommended to shift data from float to integer by multiplying, a space saving technique, as end users then need to undo this step to use the data. Data compression is preferred, see also [Compression](#compression).\n",
"\n",
"GeoTIFF format supports many data types. The key is that all bands must be of the same data type. Unlike some other formats you can not mix and match integers (whole numbers) and floats (decimal numbers) in the same file. If you have this use case consider splitting files by data type and using a catalog like STAC to keep track of them, or look at other formats like [Zarr](../zarr/intro.qmd).\n",
"\n",
"Scenario: If the COG is intended only for visualization, conversion to 3 band byte will improve performance. \n",
"\n",
"> GDAL supported Data Types [list](https://gdal.org/drivers/raster/gtiff.html#gtiff-geotiff-file-format)\n",
"\n",
"\n",
"# Compression (aka File Size)\n",
"\n",
"The biggest benefit to compression is on the storage side. It’s always recommended to use a lossless compression method. **Deflate** or **LZW** are the most recommended compression algorithms, there are some choices that depend on the data type and distribution, and if the goal is maximum compression or not. Maximum compression does result in some performance loss."
]
},
{
"cell_type": "markdown",
"id": "9fbe71f0-20b9-4a1b-8e7f-de52a90cc7c6",
"metadata": {},
"source": [
"# No Data\n",
"Setting a no data value makes it clear to users and visualization tools what pixels are not actually data. For visualization this allows these pixels to be easily hidden (transparent). Historically many values have been used, 0, -9999, etc… The key is to make sure the GDAL flag for no data is set. It is also suggested that the smallest negative value be used instead of a random value. For byte and unsigned integers/floats this will be 0, if 0 has meaning in your data use a different value (like the max possible value). Having the right nodata flag set is important for overview generation.\n",
"\n",
"# Projection\n",
"\n",
"Read performance can be greatly impacted by the choice of projection and the particular applications used for dynamic tile serving. Using a known CRS defined in the PROJ database (typically EPSG code) is preferred over custom projections. Load times can be 5-20 times greater when using a custom projection. Whenever applying projections make sure to use WKT2 representation. If using a database of known projections, i.e. EPSG codes, this should be fine, there are known issues around manually setting proj-strings.\n"
]
},
{
"cell_type": "markdown",
"id": "1e9d49cc-7e1a-4e7b-94a3-539b8e51f0c7",
"metadata": {},
"source": [
"## What we don’t know (areas of research)\n",
"\n",
"* The optimum size of data at which splitting across files improves performance as a multi-file dataset instead of a single file.\n",
"* When to recommend particular internal tile sizes\n",
"* Compression impacts on http transfer rates.\n",
"* Support for COG creation in all common Geospatial tools varies.\n"
]
},
{
"cell_type": "markdown",
"id": "d193ab02-bb69-455e-9b72-5b89728f086e",
"metadata": {},
"source": [
"## Additional Resources\n",
"\n",
"* [An Introduction to Cloud Optimized GeoTIFFS (COGs) Part 1: Overview](https://developers.planet.com/docs/planetschool/an-introduction-to-cloud-optimized-geotiffs-cogs-part-1-overview/)\n",
"* [Do you really want people using your data?](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.12 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
92 changes: 24 additions & 68 deletions cloud-optimized-geotiffs/cogs-examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,14 @@
"veg_gtiff_filename = f\"{test_data_dir}/{veg_files[0]}\""
]
},
{
"cell_type": "markdown",
"id": "13da6953",
"metadata": {},
"source": [
"> To learn more about the example data see the [Vegetation Continuous Fields (VCF) information page](https://lpdaac.usgs.gov/products/vcf5kyrv001/)."
]
},
{
"cell_type": "markdown",
"id": "451dbe01",
Expand Down Expand Up @@ -297,17 +305,13 @@
"id": "ff2cc531",
"metadata": {},
"source": [
"## Dimensions\n",
"## Data Structure\n",
"\n",
"This attribute is also sometimes called **chunks** or **internal tiles**.\n",
"**Dimensions**\n",
"Dimensions are the number of bands, rows and columns stored in a GeoTIFF. [More Info](intro.ipynb#dimensions)\n",
"\n",
"Dimensions are the number of bands, rows and columns stored in a GeoTIFF. There is a tradeoff between storing lots of data in one GeoTIFF and storing less data in many GeoTIFFs. The larger a single file, the larger the GeoTIFF header and the multiple requests may be required just to read the spatial index before data retrieval. The opposite problem occurs if you make too many small files, then it takes many reads to retrieve data, and when rendering a combined visualization can greatly impact load time.\n",
"\n",
"If you plan to pan and zoom a large amount of data through a tiling service in a web browser, there is a tradeoff between 1 large file, or many smaller files. The current recommendation is to meet somewhere in the middle, a moderate amount of medium files.\n",
"\n",
"### Internal Blocks\n",
"\n",
"Internal blocks are required if the dimensions of data are over 512x512. However you can control the size of the internal blocks. 256x256 or 512x512 are recommended. When displaying data at full resolution, or doing partial reading of data this size will impact the number of reads required. A size of 256 will take less time to read, and read less data outside the desired bounding box, however for reading large parts of a file, it may take more total read requests. Some clients will aggregate neighboring block reads to reduce the total number of requests. \n",
"**Internal Blocks** (aka chunks or internal tiles)\n",
"Internal blocks are required if the dimensions of data are over 512x512. [More Info](intro.ipynb#internal-blocks)\n",
"\n",
"Let's check out the dimensions and blocks of our GeoTIFF and Cloud-Optimized GeoTIFF."
]
Expand Down Expand Up @@ -423,15 +427,15 @@
"## Overviews\n",
"\n",
"Overviews are downsampled (aggregated) data intended for visualization.\n",
"The best resampling algorithm depends on the range, type, and distribution of the data.\n",
"\n",
"The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. The COG driver in GDAL, or rio cogeo tools should do this.\n",
"The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. \n",
"> The COG driver in GDAL, or rio cogeo tools should do this.\n",
"\n",
"There are many resampling algorithms for generating overviews. When creating overviews several options should be compared before deciding which resampling method to apply.\n",
"There are many resampling algorithms for generating overviews. The best resampling algorithm depends on the range, type, and distribution of the data. When creating overviews several options should be compared before deciding which resampling method to apply. \n",
"\n",
"GDAL >= 3.2 allows for the overview resampling method to be set directly.\n",
"\n",
"TODO: need to add hints on how to check which resampling method to use for overviews. Possibly provide code for comparing."
"<!-- TODO: need to add hints on how to check which resampling method to use for overviews. Possibly provide code for comparing. Alex has a draft of this to add.-->"
]
},
{
Expand Down Expand Up @@ -728,64 +732,11 @@
"source": [
"show_overviews(tmp_cog)"
]
},
{
"cell_type": "markdown",
"id": "0d780549-1ffb-45cd-a7ac-969072d46137",
"metadata": {},
"source": [
"# Data Type\n",
"\n",
"The smallest possible data type, that still represents the data appropriately, should be used. It is not generally recommended to shift data from float to integer by multiplying, a space saving technique, as end users then need to undo this step to use the data. Data compression is preferred, see also Compression.\n",
"\n",
"Scenario: If the COG is intended only for visualization, conversion to 3 band byte will improve performance. \n",
"\n",
"# Compression (aka File Size)\n",
"\n",
"The biggest benefit to compression is on the storage side. It’s always recommended to use a lossless compression method. Deflate or LZW are the most recommended compression algorithms, there are some choices that depend on the data type and distribution, and if the goal is maximum compression or not. Maximum compression does result in some performance loss."
]
},
{
"cell_type": "markdown",
"id": "9fbe71f0-20b9-4a1b-8e7f-de52a90cc7c6",
"metadata": {},
"source": [
"# No Data\n",
"Setting a no data value makes it clear to users and visualization tools what pixels are not actually data. For visualization this allows these pixels to be easily hidden (transparent). Historically many values have been used, 0, -9999, etc… The key is to make sure the GDAL flag for no data is set. It is also suggested that the smallest negative value be used instead of a random value. For byte and unsigned integers/floats this will be 0, if 0 has meaning in your data use a different value (like the max possible value). Having the right nodata flag set is important for overview generation.\n",
"\n",
"# Projection\n",
"\n",
"Read performance can be greatly impacted by the choice of projection and the particular applications used for dynamic tile serving. Using a known CRS defined in the PROJ database (typically EPSG code) is preferred over custom projections. Load times can be 5-20 times greater when using a custom projection. Whenever applying projections make sure to use WKT2 representation. If using a database of known projections this should be fine, there are known issues around manually setting proj-strings.\n"
]
},
{
"cell_type": "markdown",
"id": "1e9d49cc-7e1a-4e7b-94a3-539b8e51f0c7",
"metadata": {},
"source": [
"## What we don’t know (areas of research)\n",
"\n",
"* The optimum size of data at which splitting across files improves performance as a multi-file dataset instead of a single file.\n",
"* When to recommend particular internal tile sizes\n",
"* Compression impacts on http transfer rates.\n",
"* Support for COG creation in other common scientific platforms (e.g. R)\n"
]
},
{
"cell_type": "markdown",
"id": "d193ab02-bb69-455e-9b72-5b89728f086e",
"metadata": {},
"source": [
"## Additional Resources\n",
"\n",
"* [An Introduction to Cloud Optimized GeoTIFFS (COGs) Part 1: Overview](https://developers.planet.com/docs/planetschool/an-introduction-to-cloud-optimized-geotiffs-cogs-part-1-overview/)\n",
"* [Do you really want people using your data?](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3.10.12 64-bit",
"language": "python",
"name": "python3"
},
Expand All @@ -799,7 +750,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
Expand Down
27 changes: 17 additions & 10 deletions cloud-optimized-geotiffs/intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,22 @@
"\n",
"## What is a Cloud-Optimized GeoTIFF?\n",
"\n",
"Cloud-Optimized GeoTIFF (the COG) is a variant of the TIFF image format that specifies a particular layout of internal data in the GeoTIFF specification to allow for optimized (subsetted or aggregated) access over a network for display or data reading. The key components are overviews, and internal tiling.\n",
"Cloud-Optimized GeoTIFF (the COG), a raster format, is a variant of the TIFF image format that specifies a particular layout of internal data in the GeoTIFF specification to allow for optimized (subsetted or aggregated) access over a network for display or data reading. The key components are overviews, and internal tiling.\n",
"\n",
"For more details see https://www.cogeo.org/\n",
"For more details see [https://www.cogeo.org/](https://www.cogeo.org/)\n",
"\n",
"<img alt=\"COG Diagram\" src=\"../images/cog-diagram-1.png\" width=300/>\n",
"\n",
"### Dimensions\n",
"\n",
"This attribute is also sometimes called **chunks** or **internal tiles**.\n",
"\n",
"Dimensions are the number of bands, rows and columns stored in a GeoTIFF. There is a tradeoff between storing lots of data in one GeoTIFF and storing less data in many GeoTIFFs. The larger a single file, the larger the GeoTIFF header and the multiple requests may be required just to read the spatial index before data retrieval. The opposite problem occurs if you make too many small files, then it takes many reads to retrieve data, and when rendering a combined visualization can greatly impact load time.\n",
"\n",
"If you plan to pan and zoom a large amount of data through a tiling service in a web browser, there is a tradeoff between 1 large file, or many smaller files. The current recommendation is to meet somewhere in the middle, a moderate amount of medium files.\n",
"\n",
"### Internal Blocks\n",
"\n",
"> This attribute is also sometimes called **chunks** or **internal tiles**.\n",
"\n",
"Internal blocks are required if the dimensions of data are over 512x512. However you can control the size of the internal blocks. 256x256 or 512x512 are recommended. When displaying data at full resolution, or doing partial reading of data this size will impact the number of reads required. A size of 256 will take less time to read, and read less data outside the desired bounding box, however for reading large parts of a file, it may take more total read requests. Some clients will aggregate neighboring block reads to reduce the total number of requests.\n",
"\n",
"### Overviews\n",
Expand All @@ -34,7 +34,7 @@
"\n",
"The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. The COG driver in GDAL, or rio cogeo tools should do this.\n",
"\n",
"There are many resampling algorithms for generating overviews. When creating overviews several options should be compared before deciding which resampling method to apply."
"There are many [resampling algorithms](https://gdal.org/programs/gdal_translate.html#cmdoption-gdal_translate-r) for generating overviews. When creating overviews several options should be compared before deciding which resampling method to apply."
]
},
{
Expand All @@ -60,18 +60,20 @@
"* [Development Seed Blog: Do you really want people using your data?](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f)\n",
"\n",
"## How to visualize COGs\n",
"<!-- This is vague: TODO elaborate -->\n",
"\n",
"* GDAL vis* drivers (vsicurl, vsis3, vsiaz,)\n",
"* Titiler https://github.com/developmentseed/titiler\n",
"* Rio-viz https://github.com/developmentseed/rio-viz"
"* Rio-viz https://github.com/developmentseed/rio-viz\n",
"* GDAL vis* drivers (vsicurl, vsis3, vsiaz,) \n",
"* Open in your favorite Desktop GIS or Remote Sensing Application"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:geospatial]",
"display_name": "Python 3.10.12 64-bit",
"language": "python",
"name": "conda-env-geospatial-py"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -83,7 +85,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
Expand Down