Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process & display the Circum-Arctic permafrost and ground ice map #41

Open
robyngit opened this issue Jun 26, 2023 · 25 comments
Open

Process & display the Circum-Arctic permafrost and ground ice map #41

robyngit opened this issue Jun 26, 2023 · 25 comments
Labels
data available The complete dataset is on the datateam server and ready to process layer Displaying a specific data product in the PDG portal pdg Permafrost Discovery Gateway Permafrost: Subsurface Features data layer category: Permafrost: Subsurface Features

Comments

@robyngit
Copy link
Member

robyngit commented Jun 26, 2023

It was requested that we display a permafrost layer as the base layer when someone visits the PDG for the first time. The permafrost layer could be combined with the ice-wedge polygon map, so two layers presented as base layers.

Feedback collected by Moss & Andrew (the K12 teacher in Bethel) indicated that anyone coming to the PDG does not see/understand (right away) that the PDG focus is on permafrost.

The ideal permafrost layer would be the Circum-Arctic Map of Permafrost and Ground-Ice Conditions, Version 2.

According to the metadata, it looks to be a 24 MB shapefile.

@robyngit robyngit added pdg Permafrost Discovery Gateway layer Displaying a specific data product in the PDG portal data available The complete dataset is on the datateam server and ready to process priority: high labels Jun 26, 2023
@mbjones
Copy link
Member

mbjones commented Jun 26, 2023

This is a great idea. I'm curious how that relates to the two other general permafrost layers we have --

It seems like we could also show the Obo layers (soil temp and permafrost probability) on our imagery viewer.

@julietcohen
Copy link

Initial Data Exploration

I agree that this layer looks like it would be very helpful for users to understand the IWP layer! The overall data directory consists of 3 shapefiles:

  • permaice.shp - 21 MB, 13,671 rows
    • contains attributes:
      • NUM_CODE - 0 NA values
      • COMBO- 0 NA values
      • RELICT - 13658 NA values
      • EXTENT - 6480 NA values
      • CONTENT - 6480 NA values
      • LANDFORM - 6639 NA values
    • I have yet to find definitions for each.
  • subsea.shp - 18 KB
  • treeline.shp - 16 KB
    Plus a few other small .byte and .hdr files.

Notes:

  • The README in the data dir points to a documentation page that no longer exists.
  • The abstract specifies: "The data set also contains the location of subsea and relict permafrost."
  • The temporal coverage is "Not Specified" in the surface level metadata but the XML specifies 2002-02-01 as the dataset release date

permaice.shp

image

subsea.shp:

image

treeline.shp:

image

@julietcohen
Copy link

Update:

Started staging the file permaice.shp yesterday with:

  • no parallelization (small input data size)
  • no deduplication
  • max z-level 11 (for the first run, to see how it looks)
  • palette of purple with transparency, similar to the way we use yellow transparency for the IWP dataset

Staging has been ongoing on Datateam for almost a full day and is at >604,000 staged files. I will continue to let it run while I take care of other higher priority tasks.

@julietcohen
Copy link

Update:

  • Within the same job mentioned in the previous comment, staging completed for permaice.shp with 1,653,964 gpkg files and the rasterization step has started.
  • plotted a few of these GeoPackages, looks like each has 1 to a few large polygons:
image image

@julietcohen
Copy link

Workflow update

I set up a parsl job on Datateam to execute rasterization and web-tiling in parallel, since rasterization was going very slowly without parallelization. After running over the weekend, rasters were produced for the staged tiles for z-11 (the highest z-level I set in the config), but a parsl error occurred during rasterization for z-level 10: parsl.executors.high_throughput.errors.WorkerLost: Task failure due to loss of worker 5 on host datateam

@julietcohen
Copy link

I started a new parsl job to pick up where the workflow left off: restarting z-10, then the lower z-levels, then web-tiling.

@julietcohen
Copy link

Color palette based on an attribute

Visualized the web-tiles for just z-11 and z-10 (since those are the only z-levels that I've been able to produce on Datateam so far without running into an OOM error, even with parallelization) using "coverage" for the statistic since this is just a first pass. I think it makes sense to instead use an attribute of the data, such as "EXTENT". The documentation explains what the codes for this attribute mean:

  • c = continuous
  • d = discontinuous
  • s = sporadic
  • i = isolated patches

47% of the observations have nan for this attribute. These would have to be removed before staging for the rasterization step, which cannot gracefully handle nan values yet.

Could also use the attribute "RELICT" which only contains "yes" or nan. 99% of those observations are nan so probably not a good path forward. If we assume (or find in the documentation) that nan means "no" for this attribute, we could use it instead of removing all those observations, but it seems like it would result in a very uniform and uninteresting palette.

@julietcohen
Copy link

Update on dataset processing and assigning color palette to attribute

  • cleaned data to remove rows with nan for the "extent" attribute
    • this is necesary for rasterization (cannot resample rasters with NA cell/pixel values)
    • also helpfully reduces the amount of input data so the dataset can be processed on Datateam with parsl without running out of memory
  • assigned an integer code ranging from 1-4 to the categorical extent attribute with 4 categories (created a new integer attribute)
  • in config, made a statistic coverage_category with a pre-set range of values for that attribute: [1,4]

Over the weekend, the script staged and rasterized all tiles for all z-levels with no OOM errors. However, the web-tiling failed with: cannot convert float NaN to integer. for every raster.

Debugging revealed that the issue was within viz-raster/pdgraster/WebImage.py, when we convert a raster to a Python Imaging Library image. Within to_image(), we create a no_data_mask that represents the values within the image data that have no data (0 in this case) and we replace all those values with np.nan. See here. I was able to resolve the error by converting the image_data array to float right before we create the no_data_mask.

to_image()
    def to_image(self, image_data):
        """
            Create a PIL image from the pixel values.

            Parameters
            ----------
            pixel_values : pandas.DataFrame
                A dataframe with a pixel_row, pixel_col, and values column.

            Returns
            -------
            PIL.Image
                A PIL image.
        """

        min_val = self.min_val
        max_val = self.max_val
        nodata_val = self.nodata_val
        image_data = image_data.copy()
        rgba_list = self.rgba_list
        height = self.height
        width = self.width

        image_data = image_data.astype(float)
        no_data_mask = image_data == nodata_val

        # set the nodata value to np.nan
        if(len(no_data_mask)):
            image_data[no_data_mask] = np.nan
...

There will be more aspects of the viz-workflow to tweak before this dataset is complete, such as the palette. Hopefully, this is one step closer to adjusting the workflow to enable styling the 3D tiles and web tiles based on an attribute of interest. See viz-workflow issue#9).

@julietcohen
Copy link

Using cesium to preview the web tiles, we see that the polygons that span certain Arctic latitudes are distorted to create rings around the North pole:

image

Perhaps the projection of the default TMS for the viz workflow distorts the geometries that cross the Arctic circle (66 degrees) as well as the geometries that cross the latitude of the thinner, more northern ring.

These rings are not present in the input data to the visualization workflow. When we plot both the raw data and the cleaned input data, with the palette applied to the "EXTENT" attribute, we see a map without distortion. Here's the cleaned data with a viridis palette:
image

These rings would certainly make the output data confusing for users on the PDG portal.

@julietcohen
Copy link

Rings in tileset result from reprojecting the geometries to the CRS of the TMS

After re-projecting the CRS of the cleaned geodataframe to the CRS of the TMS, the data looks like:
image

Because we need to use this TMS, a possible solution is to edit the input data of the viz workflow by converting the CRS of the geopackage beforehand, then masking it to only retain polygons that fall over land, then feeding that geodataframe into the viz workflow.

@julietcohen
Copy link

Removed Invalid Geometries

The invalid geometries that were displayed like rings around the Arctic Circle as shown above were resulting from the conversion to the CRS of the TMS during staging only from the geometries that intersected the antimeridian. There were 6 geometries that intersected, and removing them before starting the workflow resulted in the following tiles:
image

The following script was used to process the data before executing the workflow.

clean_data.py
# Clean permaice data from Brown et al. 2002
# Data source: https://nsidc.org/data/ggd318/versions/2
# Author: Juliet Cohen
# Date: 2023-08-09

# Cleaning steps include:
# 1. remove NA values from extent attribute
# 2. add a dummy coding for the extent attribute
# 3. remove polygons that intersect the antimeridian

# conda env: viz_3-10_local

import geopandas as gpd
import numpy as np
import morecantile

input = "/home/jcohen/permafrost_ground_layer/data/permaice.shp"
perm = gpd.read_file(input)

# drop rows that have missing value for extent attribute
perm.dropna(subset = ['EXTENT'], inplace = True)

# add column that codes the categorical extent strings into numbers 
# in order to do stats with the workflow and assign palette to this
# first, define the conditions and choices for new extent_code attribute
conditions = [
    (perm['EXTENT'] == "C"),
    (perm['EXTENT'] == "D"),
    (perm['EXTENT'] == "S"),
    (perm['EXTENT'] == "I")
]

choices = [4, 3, 2, 1]

# Use numpy.select() to assign extent_code based on the conditions and choices
perm['extent_code'] = np.select(conditions, choices)

# convert the CRS to WGS84 to distort the geometries that intersect
# the antimeridian, so we can identify which polygons are problematic
# pull the CRS from the TMS
tms = morecantile.tms.get("WGS1984Quad")
workflow_tms = tms.crs
perm.to_crs(workflow_tms, inplace = True)

# sort the geoms based on the largest longitudinal extent (wrap around the world)
# subtract the minimum longitude value (minx) from the maximum longitude value (maxx) 
perm['longitude_extent'] = perm['geometry'].apply(lambda geom: geom.bounds[2] - geom.bounds[0])
sorted_gdf = perm.sort_values(by = 'longitude_extent', ascending = False)

# define the problematic polygons as the polygons with a longitude extent greater than 180
antimeridian_polys = [index for index, row in sorted_gdf.iterrows() if row['longitude_extent'] > 180]

# remove the indices from the cleaned_gdf to ensure we did not miss any problem polygons
nonproblematic_polys = perm.drop(antimeridian_polys)

# drop the column created for longitude_extent, no longer necessary
nonproblematic_polys.drop(columns=['longitude_extent'], inplace = True)

# save the nonproblematic polygons as a cleaned input file for the viz workflow
output_file = "/home/jcohen/permafrost_ground_layer/cleaned_data/permaice_clean.gpkg"
nonproblematic_polys.to_file(output_file, driver = "GPKG")

print("Cleaning complete.")

Next Steps

The final details to work out are:

  1. Split the 6 polygons at the antimeridian rather than removing them
  • could write custom function to split / shift the polygon vertices or convert the data to a GeoJSON and use the antimeridan package
  1. assign a palette (by adjusting the config and re-executing just web-tiling) to properly represent all the values of the attribute of interest (the extent dummy coding)
  • using both "aggregation_method": "max" (shown in the screenshot), and "aggregation_method": "mean" show mostly web tiles of one color, and with slight variation:
image
  • Using palette:
"palette": [
            "#e41ba866",
            "#af18cd66"
        ]

Note: 66 = 40% alpha and looks good, considering we want users to overlay IWP on this layer and still see the terrain

There should be 4 values that are represented:
image

@julietcohen
Copy link

julietcohen commented Aug 18, 2023

First draft ready for the PDG

The palette now displays all 4 colors. The data package on the ADC will have the one input file from Brown et al. 2022, the geopackages, and the rasters. The package must be published with a new DOI (A2MG7FX35) before we can move the layer onto production so that users can access the data/metadata. Per Matt's suggestion, the ADC metadata will include provenance triples indicating the derivation relationship with the National Snow and Ice Data Center.

image image

Web tiles need to be re-processed with a blue palette, per Anna's request, because this dataset is usually visualized in blue by others.

@julietcohen
Copy link

julietcohen commented Aug 23, 2023

Layer with blue palette

image

Layer with the default 40% opacity and IWP overlaid:
image

When this layer was first uploaded to demo, the incorrect legend order revealed a bug, described in this issue. A fix in the XML was made so the value of each legend item determines the order (1,2,3,4), rather than the string label.

The data for this layer are archived at the ADC at /var/data/10.18739/A2MG7FX35:

  • cleaned geopackage permaice_clean.gpkg that is fed into the viz workflow
  • script that cleaned the input data
  • geopackages
  • geotiffs
  • associated csv files for geopackages and geotiffs
  • the processing script to run the viz workflow
  • config

The web tiles are in: var/data/tiles/0.18739/A2MG7FX35/

To do:

  • create metadata package on production
  • upload a single geopackage and a geotiff to create spatial metadata for them
  • split polygons at the antimeridian instead of removing them
  • convert these entities to ghost entities, remove the single geopackage and geotiff from the ADC package itself
  • point to the data archives at /var/data/10.18739/A2MG7FX35 in abstract so users know where to find the data
  • add provenance triples to point to the original data archived at the NSIDC
  • after publishing with DOI, transfer this layer to the production PDG portal

@julietcohen
Copy link

julietcohen commented Sep 14, 2023

Update on splitting polygons that intersect the antimeridian

I created a script that:

  • identifies the polygons that cross the antimeridian without needing to transform them to WGS84 first
    • the data comes in the Lambert Azimuthal projection, and WGS84 is the CRS of the TMS, and transforming the polygons deforms them to wrap around the world the opposite way
  • creates a line at (what seems like) the antimeridian in the CRS of the input data
  • splits the polygons at this line, and retains their other attributes
  • appends these polygons to the input data so they will be processed with the other polygons in the viz-workflow

Problem: After converting these spit polygons to WGS84, they are still deformed in the same way (wrap around the other side of the world). A potential reason for this could be that the way I split the polygons results in their split side touching or sharing the antimeridian, so the polygons are still not able to be transformed to WGS84 correctly. This may be resolved by:

  • buffering the antimeridian line to shift the border of the polygon away from the antimeridian in all split polygons
  • or similarly, creating a long strip-like polygon at the antimeridian instead of a line, and using that to split the polygons

Defining the antimeridian in the CRS of the input data is the most complicated part of this process. In WGS84, defining a line at the antimeridian is easy. However, the CRS of the input data is in units of meters. The CRS info is the following:

image

I tried converting the antimeridian line in WGS84 to the Lambert Azimuthal projection, but the values that are not 0 become infinite. The closest solution I found so far, after trail and error of changing the coordinates and mapping, was to manually define the LineString with 2 coordinate values: line_geometry = LineString([(0, -20000000), (0, 20000000)]). The +/- 20000000 values are derived from mapping the polygons in the Lambert Azimuthal projection and checking the values on the x-axis (this line looks like the antimeridian when plotted, and this line looks like it splits the polygons where I would visually want them split, based on where the antimeridian crosses them).

clean_data_v2.py
# Clean permaice data from Brown et al. 2002
# Data source: https://nsidc.org/data/ggd318/versions/2
# Author: Juliet Cohen
# Date: 2023-09-13

# Cleaning steps include:
# 1. remove NA values from extent attribute
# 2. add a dummy coding for the extent attribute
# 3. split polygons that intersect the antimeridian

# conda env: viz_3-10_local

import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import LineString
from shapely.ops import split

input = "/home/jcohen/permafrost_ground_layer/data/permaice.shp"
perm = gpd.read_file(input)

# drop rows that have missing value for extent attribute
perm.dropna(subset = ['EXTENT'], inplace = True)

# create new gdf of just the polygons that have a neg min longitude and a pos max longitude 
# this indicates that the polygon crosses the antimeridian or prime meridian (7 polygons)
meridian_polys = (perm['geometry'].bounds['minx'] < 0 ) & (perm['geometry'].bounds['maxx'] > 0)

# subset the data to just polygons that cross either of the merdidians
# and retain all attributes by appending `.copy()`
prob_polys = perm[meridian_polys].copy()

# subset the prob_polys to those that cross antimeridian, not prime meridian
# the only polygon that crosses the prime meridian is at index 60
polys_AM = prob_polys.drop([60], inplace = False)

# remove the antimeridian polygons from the original gdf 
# so when the split ones are appeneded later, there won't be duplicates
perm.drop(polys_AM.index, inplace = True)

# SPLIT THE ANTIMERIDIAN POLYGONS:
# first create a line gdf at the 180th longitude,
# that's where the center is in this dataset according to metadata,
# but the units are meters, not degrees, so use 20,000,000 instead of 180
line_geometry = LineString([(0, -20000000), (0, 20000000)])
am = gpd.GeoDataFrame(geometry = [line_geometry])
# set the CRS to that of the polygons that need to be split
am.set_crs(polys_AM.crs, inplace = True)

# create empty lists to store the split polygons and their attributes
geoms_all = []
atts_all = []

# iterate over each geometry that crosses the antimeridian
for index, row in polys_AM.iterrows():
    # define the geometry and attributes separately
    geom = row['geometry']
    atts = row.drop('geometry')
    # append the atts to atts_all so we can append them to geoms after loop
    atts_all.append(atts)

    # return a boolean series that fits both the conditions:
    # 1) the line and geom do intersect, 
    # 2) the line and geom do not touch (share a boundary)
    line = am.loc[(am.geometry.intersects(geom)) & (~am.geometry.touches(geom))]
    # split the single geometry at the antimeridian,
    # outputing multiple geometries stored within a "geometrycollection" per row
    split_polys = split(geom, line.geometry.iloc[0])
    # add split polygons to the output list
    geoms_all.append(split_polys)

# create a GeoDataFrame from the split polygons
geoms_all_gdf = gpd.GeoDataFrame(geometry = geoms_all)
geoms_all_gdf.set_crs(polys_AM.crs, inplace = True)
# create a GeoDataFrame from the attributes of the split polygons
atts_all_gdf = pd.DataFrame(atts_all, index = [0, 1, 2, 3, 4, 5])
# combine the split polys with the attributes into one gdf
geoms_atts = pd.concat([geoms_all_gdf, atts_all_gdf], axis = 1)

# separate the polygons from "geometrycollection" to individual polygon geoms
# 5 of the polygons are split once, and 1 polygon is split twice
split_polys_all = geoms_atts.explode(ignore_index = True, index_parts = False)

# append the split polygons to the same gdf as other polygons
perm = pd.concat([perm, split_polys_all], ignore_index = True)

# add column that codes the categorical extent strings into numbers 
# in order to do stats with the workflow and assign palette to this
# first, define the conditions and choices for new extent_code attribute
conditions = [
    (perm['EXTENT'] == "C"),
    (perm['EXTENT'] == "D"),
    (perm['EXTENT'] == "S"),
    (perm['EXTENT'] == "I")
]

choices = [4, 3, 2, 1]

# Use numpy.select() to assign extent_code based on the conditions and choices
perm['extent_code'] = np.select(conditions, choices)

# save the nonproblematic polygons as a cleaned input file for the viz workflow
output_file = "/home/jcohen/permafrost_ground_layer/cleaned_data/permaice_clean_split.gpkg"
perm.to_file(output_file, driver = "GPKG")

print("Cleaning complete.")


@julietcohen
Copy link

julietcohen commented Sep 15, 2023

Visuals for antimeridian polygon splitting

The first part of the script reads in the input data, removes NA values from the attribute we are visualizing, and identifies the polygons that intersect the antimeridan.

Part 1 of clean_data_v2.py
input = "/home/jcohen/permafrost_ground_layer/data/permaice.shp"
perm = gpd.read_file(input)

# drop rows that have missing value for extent attribute
perm.dropna(subset = ['EXTENT'], inplace = True)

# create new gdf of just the polygons that have a neg min longitude and a pos max longitude 
# this indicates that the polygon crosses the antimeridian or prime meridian (7 polygons)
meridian_polys = (perm['geometry'].bounds['minx'] < 0 ) & (perm['geometry'].bounds['maxx'] > 0)

# subset the data to just polygons that cross either of the merdidians
# and retain all attributes by appending `.copy()`
prob_polys = perm[meridian_polys].copy()

# subset the prob_polys to those that cross antimeridian, not prime meridian
# the only polygon that crosses the prime meridian is at index 60
polys_AM = prob_polys.drop([60], inplace = False)

...

The polygons visualized at this stage:
image

The next steps in the cleaning script create a line that represents the antimeridian and splits the polygons there.

Part 2 of clean_data_v2.py
# remove the antimeridian polygons from the original gdf 
# so when the split ones are appeneded later, there won't be duplicates
perm.drop(polys_AM.index, inplace = True)

# SPLIT THE ANTIMERIDIAN POLYGONS:
# first create a line gdf at the 180th longitude,
# that's where the center is in this dataset according to metadata,
# but the units are meters, not degrees, so use 20,000,000 instead of 180
line_geometry = LineString([(0, -20000000), (0, 20000000)])
am = gpd.GeoDataFrame(geometry = [line_geometry])
# set the CRS to that of the polygons that need to be split
am.set_crs(polys_AM.crs, inplace = True)

# create empty lists to store the split polygons and their attributes
geoms_all = []
atts_all = []

# iterate over each geometry that crosses the antimeridian
for index, row in polys_AM.iterrows():
    # define the geometry and attributes separately
    geom = row['geometry']
    atts = row.drop('geometry')
    # append the atts to atts_all so we can append them to geoms after loop
    atts_all.append(atts)

    # return a boolean series that fits both the conditions:
    # 1) the line and geom do intersect, 
    # 2) the line and geom do not touch (share a boundary)
    line = am.loc[(am.geometry.intersects(geom)) & (~am.geometry.touches(geom))]
    # split the single geometry at the antimeridian,
    # outputing multiple geometries stored within a "geometrycollection" per row
    split_polys = split(geom, line.geometry.iloc[0])
    # convert the split polygons to a gdf in order to concatenate with the attributes
    #split_polys_gdf = gpd.GeoDataFrame(geometry = split_polys)
    #polys_with_atts = pd.concat([split_polys, atts], axis = 1)
    # add split polygons to the output list
    geoms_all.append(split_polys)

# create a GeoDataFrame from the split polygons
geoms_all_gdf = gpd.GeoDataFrame(geometry = geoms_all)
geoms_all_gdf.set_crs(polys_AM.crs, inplace = True)
# create a GeoDataFrame from the attributes of the split polygons
atts_all_gdf = pd.DataFrame(atts_all, index = [0, 1, 2, 3, 4, 5])
# combine the split polys with the attributes into one gdf
geoms_atts = pd.concat([geoms_all_gdf, atts_all_gdf], axis = 1)

# separate the polygons from "geometrycollection" to individual polygon geoms
# 5 of the polygons are split once, and 1 polygon is split twice
split_polys_all = geoms_atts.explode(ignore_index = True, index_parts = False)

...

The polygons visualized at this stage:

image

Then we can test if this splitting worked for our intended purposes by converting the polygons at this stage to WGS84 and plotting the result in the same way.

transform
# convert the polygons to WGS84 to see how they are still deformed
split_polys_all.to_crs("EPSG:4326", inplace = True)
image

They still wrap around the world the opposite way when converted to WGS84.

@robyngit
Copy link
Member Author

@julietcohen, just exploring the CRS from this data...

perm.crs.name
# 'Sphere_ARC_INFO_Lambert_Azimuthal_Equal_Area'

perm.crs.to_epsg()
# None
More CRS Info
perm.crs.to_json_dict()

# output:
{'$schema': 'https://proj.org/schemas/v0.7/projjson.schema.json',
 'type': 'ProjectedCRS',
 'name': 'Sphere_ARC_INFO_Lambert_Azimuthal_Equal_Area',
 'base_crs': {'name': 'GCS_Sphere_ARC_INFO',
  'datum': {'type': 'GeodeticReferenceFrame',
   'name': 'D_Sphere_ARC_INFO',
   'ellipsoid': {'name': 'Sphere_ARC_INFO', 'radius': 6370997}},
  'coordinate_system': {'subtype': 'ellipsoidal',
   'axis': [{'name': 'Longitude',
     'abbreviation': 'lon',
     'direction': 'east',
     'unit': {'type': 'AngularUnit',
      'name': 'Degree',
      'conversion_factor': 0.0174532925199433}},
    {'name': 'Latitude',
     'abbreviation': 'lat',
     'direction': 'north',
     'unit': {'type': 'AngularUnit',
      'name': 'Degree',
      'conversion_factor': 0.0174532925199433}}]}},
 'conversion': {'name': 'unnamed',
  'method': {'name': 'Lambert Azimuthal Equal Area (Spherical)',
   'id': {'authority': 'EPSG', 'code': 1027}},
  'parameters': [{'name': 'Latitude of natural origin',
    'value': 90,
    'unit': {'type': 'AngularUnit',
     'name': 'Degree',
     'conversion_factor': 0.0174532925199433},
    'id': {'authority': 'EPSG', 'code': 8801}},
   {'name': 'Longitude of natural origin',
    'value': 180,
    'unit': {'type': 'AngularUnit',
     'name': 'Degree',
     'conversion_factor': 0.0174532925199433},
    'id': {'authority': 'EPSG', 'code': 8802}},
   {'name': 'False easting',
    'value': 0,
    'unit': 'metre',
    'id': {'authority': 'EPSG', 'code': 8806}},
   {'name': 'False northing',
    'value': 0,
    'unit': 'metre',
    'id': {'authority': 'EPSG', 'code': 8807}}]},
 'coordinate_system': {'subtype': 'Cartesian',
  'axis': [{'name': 'Easting',
    'abbreviation': '',
    'direction': 'east',
    'unit': 'metre'},
   {'name': 'Northing',
    'abbreviation': '',
    'direction': 'north',
    'unit': 'metre'}]}}

This seems to be a projection without a EPSG code. If you try setting it to EPSG:9820 (perm.set_crs(9820, allow_override=True)), you get an error from the Proj library: CRSError: Invalid projection: EPSG:9820: (Internal Proj Error: proj_create: crs not found). I'm not sure 9820 is the identifier for this projection anyways. In their paper, the authors say they used the the Lambert Azimuthal Equal Area Polar Projection. The only information that I could find out about this projection really is that it is identified as SR-ORG:80, and someone had the exact issue you did with it in geopandas (see: geopandas issue: "to_crs error for polygons crossing the -180 degree meridian"). They indicated that reprojecting in ArcMap works.

I'm not sure, but I have a feeling that some of the projection information needed is not handled properly by geopandas (or more accurately: by the PROJ library that geopandas uses under the hood). I wonder if it's possible to re-project in ARC first.

Also, I don't know if this might help in splitting the polygons, but there is some prime meridian info in the CRS object:

perm.crs.prime_meridian
# PRIMEM["Greenwich",0, ANGLEUNIT["Degree",0.0174532925199433]]

@julietcohen
Copy link

julietcohen commented Sep 18, 2023

Thanks for the insight, @robyngit! That helps clarify the confusing outputs and info (and lack of info) that I found online about this projection as well. I think you're right that there's information that geopandas can't handle, because an error is also returned from reprojecting an antimeridian line in WGS84 to the CRS of the input data.

reproduce error
# raw input data
input = "/home/jcohen/permafrost_ground_layer/data/permaice.shp"
perm = gpd.read_file(input)

# drop rows that have missing value for extent attribute
# (subsets the data to only the polygons of interest)
perm.dropna(subset = ['EXTENT'], inplace = True)

# create new gdf of just the polygons that have a neg min longitude and a pos max longitude 
# this indicates that the polygon crosses the antimeridian or prime meridian (7 polygons)
meridian_polys = (perm['geometry'].bounds['minx'] < 0 ) & (perm['geometry'].bounds['maxx'] > 0)
# subset the data to just polygons that cross either of the merdidians
prob_polys = perm[meridian_polys].copy() 
# subset the prob_polys to those that cross antimeridian, not prime meridian
# the only polygon that crosses the prime meridian is at index 60
polys_AM = prob_polys.drop([60], inplace = False)

# create a LineString from the north pole to south pole along antimeridian
line = LineString([(180, 90), (180, -90)])

# Create a GeoDataFrame with the line
line = gpd.GeoDataFrame({'geometry': [line]}, crs = 'EPSG:4326')

# reproject to the crs of input data 
line.to_crs(polys_AM, inplace=True)

output:
ValueError: The truth value of a GeoDataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I will look into your suggestion to reproject this shapefile in ARCmap.

@elongano elongano added Permafrost: Subsurface Features data layer category: Permafrost: Subsurface Features and removed priority: high labels Jan 29, 2024
@elongano elongano moved this to In Progress in Data Layers Jan 29, 2024
@julietcohen
Copy link

julietcohen commented Mar 22, 2024

Update: splitting polygons to enable clean transformation to EPSG:4326

Converting the original data to EPSG:4326 in QGIS did not work because the uploaded file does not have a known EPSG. The Lambert projection is labeled as a custom CRS. QGIS is different than ArcMap (although the specific differences are unknown to me), so there is a chance that still may work in ArcMap, but I do not have access to ArcMap at this moment (maybe I can through a coworker at NCEAS).

Instead of doing that CRS conversion for the data file with that software, I continued to try to split the polygons with a buffer at the antimeridian.

Attempted programatic approaches

In order to split the polygons the cross the antimeridian, I tried to use geopandas and shapely. As noted earlier in this issue, simply splitting the polygons at a defined LineString for the antimeridian (in meters) did work in creating valid geometries that lied on either side of the meridian. However, when the geometries were then transformed into EPSG:4326 they were still deformed and wrapped the opposite way around the world likely because they were not buffered from the antimeridian, so the new split side of the polygon likely still intersected the antimeridian. It could be that my definition of the linestring for the antimeridian in meters was slightly off. I suggested that this may be able to be resolved by splitting the polygons with a buffered antimeridian linestring.

buffer AM line
# read in polygons that cross AM
polys = gpd.read_file("~/permafrost_ground_layer/split_AM_polys/AM_polygons.gpkg")

# first create a line gdf at the 180th longitude,
# that's where the center is in this dataset according to metadata,
# but the units are meters, not degrees, so use 20,000,000 instead of 180
line_geometry = LineString([(0, -20000000), (0, 20000000)])
am = gpd.GeoDataFrame(geometry = [line_geometry])
# set the CRS to that of the polygons that need to be split
am.set_crs(polys.crs, inplace = True)
buffer = 600 
am_buffered = am.buffer(buffer)

# create empty lists to store the split polygons and their attributes
geoms_all = []
atts_all = []

# iterate over each geometry that crosses the antimeridian
for index, row in polys.iterrows():
    # define the geometry and attributes separately
    geom = row['geometry']
    atts = row.drop('geometry')
    # append the atts to atts_all so we can append them to geoms after loop
    atts_all.append(atts)

    # return a boolean series that fits both the conditions:
    # 1) the line and geom do intersect, 
    # 2) the line and geom do not touch (share a boundary)
    line = am_buffered.loc[am_buffered.geometry.intersects(geom)]
    # split the single geometry at the buffered antimeridian,
    # outputing multiple geometries stored within a "geometrycollection" per row
    split_polys = split(geom, line.geometry.iloc[0])
    # convert the split polygons to a gdf in order to concatenate with the attributes
    #split_polys_gdf = gpd.GeoDataFrame(geometry = split_polys)
    #polys_with_atts = pd.concat([split_polys, atts], axis = 1)
    # add split polygons to the output list
    geoms_all.append(split_polys)

# create a GeoDataFrame from the split polygons
geoms_all_gdf = gpd.GeoDataFrame(geometry = geoms_all)
geoms_all_gdf.set_crs(polys.crs, inplace = True)

geoms_all_gdf_exploded = geoms_all_gdf.explode().reset_index(drop = True)
split_84 = geoms_all_gdf_exploded.to_crs(epsg="4326", inplace = False)

Unfortunately, an error resulted: GeometryTypeError: Splitting a Polygon with a Polygon is not supported
I thought maybe QGIS can do this, but first wanted to try another programatic approach.

Next I tried to split the polygons with the original (not buffered) antimeridian LineString, then buffer the polygons's split side after they are split. But I could not figure out how to buffer one side of a polygon. I experimented with geopandas.buffer():

for split_geom in split_polys.geoms:
        if split_geom.intersects(line.geometry.iloc[0]):
            buffered_split = split_geom.buffer(buffer_distance, single_sided = True)
            geoms_all.append(buffered_split)

The output does not look buffered when mapped, even when the buffer distance is huge.

QGIS approach

While I have not 100% given up hope that geopandas.buffer() may be a programmatic solution, I tested if QGIS can split polygons with a buffered LineString, and it can! I uploaded 2 spatial files, the polygons that cross the antimeridian and the antimeridian in units of meters, and buffered the antimeridain 10,000 m. It can then split the polygons where the buffered line (polygon) is, and export the split polygons as a spatial file in the same "custom" Lambert projection.

image

Converting these geoms to EPSG:4326 shows no wrapping around the world:
image

@julietcohen
Copy link

I am not sure how to define an object or a LineString as PRIMEM["Greenwich",0, ANGLEUNIT["Degree",0.0174532925199433]], which Robyn found for the Lambert projection metadata. I tried a few approaches including line_geometry = polys.crs.prime_meridian but errors are returned like TypeError: Input must be valid geometry objects: Greenwich

@julietcohen
Copy link

I have come up with a new cleaning script that does the following steps programmatically:

  • ingests all the input data for the permafrost and ground ice dataset
  • remove NA values from extent attribute
  • defines the antimeridian in the projected CRS of the input data and buffers it by only 1 meter
  • splits the polygons that cross the antimeridian
  • appends those split polygons to the others with their attributes retained
  • add a dummy coding for the extent attribute
  • saves the GDF as a new file to use as input into the viz workflow
clean_perm_v3.py
# Clean permaice data from Brown et al. 2002
# Data source: https://nsidc.org/data/ggd318/versions/2
# Author: Juliet Cohen
# Date: 2024-05-01

# Cleaning steps include:
# 1. remove NA values from extent attribute
# 2. add a dummy coding for the extent attribute
# 3. split polygons that intersect the antimeridian

# conda env: viz_3-10_local

import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import LineString


input = "/home/jcohen/permafrost_ground_layer/data/permaice.shp"
perm = gpd.read_file(input)

# drop rows that have missing value for extent attribute
# since this is the attribute to visualize
perm.dropna(subset = ['EXTENT'], inplace = True)

# create new gdf of just the polygons that have a neg min longitude and a pos max longitude 
# this indicates that the polygon crosses the antimeridian or prime meridian (7 polygons)
meridian_polys = (perm['geometry'].bounds['minx'] < 0 ) & (perm['geometry'].bounds['maxx'] > 0)

# subset the data to just polygons that cross either of the merdidians
# and retain all attributes by appending `.copy()`
prob_polys = perm[meridian_polys].copy()

# subset the prob_polys to those that cross antimeridian, not prime meridian
# the only polygon that crosses the prime meridian is at index 60
polys_AM = prob_polys.drop([60], inplace = False)

# remove the antimeridian polygons from the original gdf 
# so when the split ones are appeneded later, there won't be duplicates
perm.drop(polys_AM.index, inplace = True)

# Split polygons that cross the antimeridian
# Step 1. create a line gdf at the 180th longitude,
#       that's where the center is in this dataset according to metadata,
#       the units are meters, not degrees, so use 20,000,000 instead of 180
am = gpd.GeoSeries(LineString([(0, -20000000), (0, 20000000)]))
am.set_crs(perm.crs, inplace = True)
# buffer the line with 1 meter (units of CRS) to convert it to a polygon
am_buffered = am.buffer(distance = 1, 
                        cap_style = 2, 
                        join_style = 0, 
                        mitre_limit = 2)

# create empty lists to store the split polygons and their attributes
all_data = []

# iterate over each geometry that crosses the antimeridian
for index, row in polys_AM.iterrows():
    # define the geometry and attributes separately
    geom = row['geometry']
    atts = gpd.GeoDataFrame(row.drop('geometry').to_frame().T)
    # split the single geometry with the buffered antimeridian GeoSeries,
    # outputing multiple geoms stored within a MultiPolygon
    split_geoms_MP = geom.difference(am_buffered)
    # make the index match the atts to concat correctly
    split_geoms_MP.index = atts.index
    # convert to GDF to define the geometry col
    split_geoms_MP_gdf = gpd.GeoDataFrame(geometry = split_geoms_MP)
    split_geoms_MP_gdf.set_crs(perm.crs, inplace = True)
    MP_with_atts = pd.concat([split_geoms_MP_gdf, atts], axis = 1)
    # MP_with_atts.reset_index(inplace = True) # not sure if i need this
    P_with_atts = MP_with_atts.explode(ignore_index = False,
                                       index_parts = False)
    # concatenate the exploded geometries with their attributes
    all_data.append(P_with_atts)

# create empty gdf to store final result
all_data_concat = gpd.GeoDataFrame()

# iterate over each gdf in all_data, concatenate into single gdf
for gdf in all_data:
    all_data_concat = pd.concat([all_data_concat, gdf], 
                                ignore_index = True)

all_data_concat.reset_index(drop = True, inplace = True)

# append the split polygons to the same gdf as other polygons
perm = pd.concat([perm, all_data_concat], ignore_index = True)

# add column that codes the categorical extent strings into numbers 
# in order to do stats with the workflow and assign palette to this
# first, define the conditions and choices for new extent_code attribute
conditions = [
    (perm['EXTENT'] == "C"),
    (perm['EXTENT'] == "D"),
    (perm['EXTENT'] == "S"),
    (perm['EXTENT'] == "I")
]
choices = [4, 3, 2, 1]

# assign extent_code based on the conditions and choices
perm['extent_code'] = np.select(conditions, choices)

# save as a cleaned input file for the viz workflow
output_file = "/home/jcohen/permafrost_ground_layer/split_AM_polys/cleaned_0501/permaice_clean_split.gpkg"
perm.to_file(output_file, driver = "GPKG")

print("Cleaning complete.")

In order to integrate this approach as a generalized step in viz-staging before staging any input data that contains geometries that cross the antimeridian, I'll need to find a way to generalize the way the antimeridian is defined. In this script, I define it as a LineString by explicitly defining the 2 coordinates. The LineString should ideally be defined by fetching the CRS of the input data and pulling the antimeridian from that. The following step will also buffer the LineString based on the minimum possible value for the distance argument of buffer(). I have not achieved this generalized approach yet, so this script will likely serve as the cleaning (pre-vizualization) for this dataset, and then for others we can integrate this functionality into viz-staging.

@julietcohen
Copy link

I visualized the dataset that was cleaned with the antimeridian polygons split as described above and uploaded it to the demo portal. I think the split polygons look great. The data was processed with max z-level set to 10. Processing the dataset for the final time will be with a higher z-level. The screenshot below is visualized with 65% opacity.

image
parsl_workflow.py
# Processing the permafrost and ground ice data layer 
# staging through web tiling with parsl
# conda environment: viz_3-10_local, with local installs
# for viz-staging and viz-raster

import pdgstaging
import pdgraster
import config

from datetime import datetime
import json
import logging
import logging.handlers
from pdgstaging import logging_config
import os

import parsl
from parsl import python_app
from parsl.config import Config
from parsl.channels import LocalChannel
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider

import shutil
import subprocess
from subprocess import Popen
user = subprocess.check_output("whoami").strip().decode("ascii")

workflow_config = config.workflow_config

# start with a fresh directory!
print("Removing old directories and files...")
base_dir = "/home/jcohen/permafrost_ground_layer/may_runs/"
old_filepaths = [f"{base_dir}staging_summary.csv",
                f"{base_dir}raster_summary.csv",
                f"{base_dir}raster_events.csv",
                f"{base_dir}config__updated.json",
                f"{base_dir}log.log"]
for old_file in old_filepaths:
  if os.path.exists(old_file):
      os.remove(old_file)

# remove dirs from past run
old_dirs = [f"{base_dir}staged",
            f"{base_dir}geotiff",
            f"{base_dir}web_tiles",
            f"{base_dir}runinfo"]
for old_dir in old_dirs:
  if os.path.exists(old_dir) and os.path.isdir(old_dir):
      shutil.rmtree(old_dir)


activate_conda = 'source /home/jcohen/.bashrc; conda activate viz_3-10_local'
htex_local = Config(
    executors=[
        HighThroughputExecutor(
            label="htex_local",
            worker_debug=False,
            #cores_per_worker=1,
            max_workers=10,
            provider=LocalProvider(
                #channel=LocalChannel(),
                init_blocks=1,
                max_blocks=1,
                worker_init=activate_conda
            ),
        )
    ],
)
parsl.clear()
parsl.load(htex_local)

def run_pdg_workflow(
    workflow_config,
    batch_size = 300
):
    """
    Run the main PDG workflow for the following steps:
    1. staging
    2. raster highest
    3. raster lower
    4. web tiling

    Parameters
    ----------
    workflow_config : dict
        Configuration for the PDG staging workflow, tailored to rasterization and 
        web tiling steps only.
    batch_size: int
        How many staged files, geotiffs, or web tiles should be included in a single creation
        task? (each task is run in parallel) Default: 300
    """

    start_time = datetime.now()

    logging.info("Staging initiated.")

    stager = pdgstaging.TileStager(workflow_config)
    tile_manager = stager.tiles
    config_manager = stager.config

    input_paths = stager.tiles.get_filenames_from_dir('input')
    input_batches = make_batch(input_paths, batch_size)

    # Stage all the input files (each batch in parallel)
    app_futures = []
    for i, batch in enumerate(input_batches):
        app_future = stage(batch, workflow_config)
        app_futures.append(app_future)
        logging.info(f'Started job for batch {i} of {len(input_batches)}')

    # Don't continue to next step until all files have been staged
    [a.result() for a in app_futures]

    logging.info("Staging complete.")

    # ----------------------------------------------------------------

    # Create highest geotiffs 
    rasterizer = pdgraster.RasterTiler(workflow_config)

    # Process staged files in batches
    logging.info(f'Collecting staged file paths to process...')
    staged_paths = tile_manager.get_filenames_from_dir('staged')
    logging.info(f'Found {len(staged_paths)} staged files to process.')
    staged_batches = make_batch(staged_paths, batch_size)
    logging.info(f'Processing staged files in {len(staged_batches)} batches.')

    app_futures = []
    for i, batch in enumerate(staged_batches):
        app_future = create_highest_geotiffs(batch, workflow_config)
        app_futures.append(app_future)
        logging.info(f'Started job for batch {i} of {len(staged_batches)}')

    # Don't move on to next step until all geotiffs have been created
    [a.result() for a in app_futures]

    logging.info("Rasterization highest complete. Rasterizing lower z-levels.")

    # ----------------------------------------------------------------

    # Rasterize composite geotiffs
    min_z = config_manager.get_min_z()
    max_z = config_manager.get_max_z()
    parent_zs = range(max_z - 1, min_z - 1, -1)

    # Can't start lower z-level until higher z-level is complete.
    for z in parent_zs:

        # Determine which tiles we need to make for the next z-level based on the
        # path names of the geotiffs just created
        logging.info(f'Collecting highest geotiff paths to process...')
        child_paths = tile_manager.get_filenames_from_dir('geotiff', z = z + 1)
        logging.info(f'Found {len(child_paths)} highest geotiffs to process.')
        # create empty set for the following loop
        parent_tiles = set()
        for child_path in child_paths:
            parent_tile = tile_manager.get_parent_tile(child_path)
            parent_tiles.add(parent_tile)
        # convert the set into a list
        parent_tiles = list(parent_tiles)

        # Break all parent tiles at level z into batches
        parent_tile_batches = make_batch(parent_tiles, batch_size)
        logging.info(f'Processing highest geotiffs in {len(parent_tile_batches)} batches.')

        # Make the next level of parent tiles
        app_futures = []
        for parent_tile_batch in parent_tile_batches:
            app_future = create_composite_geotiffs(
                parent_tile_batch, workflow_config)
            app_futures.append(app_future)

        # Don't start the next z-level, and don't move to web tiling, until the
        # current z-level is complete
        [a.result() for a in app_futures]

    logging.info("Composite rasterization complete. Creating web tiles.")

    # ----------------------------------------------------------------

    # Process web tiles in batches
    logging.info(f'Collecting file paths of geotiffs to process...')
    geotiff_paths = tile_manager.get_filenames_from_dir('geotiff')
    logging.info(f'Found {len(geotiff_paths)} geotiffs to process.')
    geotiff_batches = make_batch(geotiff_paths, batch_size)
    logging.info(f'Processing geotiffs in {len(geotiff_batches)} batches.')

    app_futures = []
    for i, batch in enumerate(geotiff_batches):
        app_future = create_web_tiles(batch, workflow_config)
        app_futures.append(app_future)
        logging.info(f'Started job for batch {i} of {len(geotiff_batches)}')

    # Don't record end time until all web tiles have been created
    [a.result() for a in app_futures]

    end_time = datetime.now()
    logging.info(f'⏰ Total time to create all z-level geotiffs and web tiles: '
                 f'{end_time - start_time}')

# ----------------------------------------------------------------

# Define the parsl functions used in the workflow:

@python_app
def stage(paths, config):
    """
    Stage a file
    """
    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgstaging
    from pdgstaging import logging_config

    stager = pdgstaging.TileStager(config = config, check_footprints = False)
    for path in paths:
        stager.stage(path)
    return True

# Create highest z-level geotiffs from staged files
@python_app
def create_highest_geotiffs(staged_paths, config):
    """
    Create a batch of geotiffs from staged files
    """
    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgraster
    from pdgraster import logging_config

    # rasterize the vectors, highest z-level only
    rasterizer = pdgraster.RasterTiler(config)
    return rasterizer.rasterize_vectors(
        staged_paths, make_parents = False)

# ----------------------------------------------------------------

# Create composite geotiffs from highest z-level geotiffs 
@python_app
def create_composite_geotiffs(tiles, config):
    """
    Create a batch of composite geotiffs from highest geotiffs
    """
    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgraster
    from pdgraster import logging_config

    rasterizer = pdgraster.RasterTiler(config)
    return rasterizer.parent_geotiffs_from_children(
        tiles, recursive = False)

# ----------------------------------------------------------------

# Create a batch of webtiles from geotiffs
@python_app
def create_web_tiles(geotiff_paths, config):
    """
    Create a batch of webtiles from geotiffs
    """

    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgraster
    from pdgraster import logging_config

    rasterizer = pdgraster.RasterTiler(config)
    return rasterizer.webtiles_from_geotiffs(
        geotiff_paths, update_ranges = False)


def make_batch(items, batch_size):
    """
    Create batches of a given size from a list of items.
    """
    return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]

# ----------------------------------------------------------------

# run the workflow
# run_pdg_workflow(workflow_config)
if __name__ == "__main__":
    run_pdg_workflow(workflow_config)

# Shutdown and clear the parsl executor
htex_local.executors[0].shutdown()
parsl.clear()

# transfer log from /tmp to user dir
cmd = ['mv', '/tmp/log.log', f'/home/{user}/permafrost_ground_layer/may_runs/']
# initiate the process to run that command
process = Popen(cmd)

print("Script complete.")
config.py
workflow_config = {
    "deduplicate_clip_to_footprint": False,
    "dir_input": "/home/jcohen/permafrost_ground_layer/cleaned_v3/",
    "ext_input": ".gpkg",
    "dir_staged": "/home/jcohen/permafrost_ground_layer/may_runs/staged/", 
    "dir_geotiff": "/home/jcohen/permafrost_ground_layer/may_runs/geotiff/",
    "dir_web_tiles": "/home/jcohen/permafrost_ground_layer/may_runs/web-tiles/",
    "filename_staging_summary": "/home/jcohen/permafrost_ground_layer/may_runs/staging_summary.csv",
    "filename_rasterization_events": "/home/jcohen/permafrost_ground_layer/may_runs/rasterization_events.csv",
    "filename_rasters_summary": "/home/jcohen/permafrost_ground_layer/may_runs/rasters_summary.csv",
    "tms_id": "WGS1984Quad",
    "z_range": [
      0,
      10
    ],
    "geometricError": 57,
    "z_coord": 0,
    "statistics": [
      {
        "name": "coverage_category",
        "weight_by": "area",
        "property": "extent_code",
        "aggregation_method": "max",
        "resampling_method": "mode",
        "val_range": [1,4],
        "palette": [
          "#17c0d3",
          "#179bd3",
          "#1775d3",
          "#154abc"
        ],
        "nodata_val": 0,
        "nodata_color": "#ffffff00"
      }
    ],
    "deduplicate_method": None,
    "deduplicate_at": None
  }

@julietcohen
Copy link

Polygons south of 45 degrees latitude are absent in tilesets

Brown et al. dataset in QGIS

Polygons with one of the 4 categories of the EXTENT value (not NA) are present below 45 degrees latitude in the original data from Brown et al. When I visualize the data in the viz workflow, the output tilesets seem to be cut off at 45 degrees south. I wonder why this is.

Here is the Brown et al. data visualized in QGIS, with a palette that shows dark blue as continuous extent (C), lighter blues for less continuous extents (D and I and S), and white for other (NA value).
image

Brown et al. dataset in tilesets

In the PDG demo portal, the polygons are cut off at 50 degrees latitude when zoomed out, and 45 degrees latitude when zoomed in, where the lower polygons appear only when you zoom in. No matter the zoom level, a clear latitudinal line serves as a cut off for the polygons.

image

Zooming in, we also see a few undesired vertical/longitudinal "stripes" that extend from 45 degrees more south, originating from the southernmost part of the polygons. These stripes very closely resemble the bands we saw stretch around the world horizontally/latitudinally when the polygons that crossed the antimeridian and became distorted when converted to WGS84. Note that these vertical stripes only appear when very zoomed in, they are not present when zoomed out. Northern Washington state is pictured here.

image

@julietcohen
Copy link

There is potential for the limiting factor to be in the bounding box created in viz-staging here. Perhaps the bottom of the bounding box is hard coded to a certain latitude somehow.

@julietcohen
Copy link

julietcohen commented May 15, 2024

In viz-staging, we use geopandas total_bounds here to determine the bounding box of the TMS. In a notebook, I imported the input data for the viz workflow (the gpkg output from the cleaning script) and converted it to EPSG 4326, and ran gdf.total_bounds. The output: array([-179.99997917, 27.42921194, 179.99997917, 83.6228101 ]), which has a miny of ~27 so it should include polygons that extend south of 45 degrees latitude. I think a good next step would be to start the viz workflow again, and check these variables grid and bounds to determine that the bbox is what we want

@julietcohen
Copy link

I subset the cleaned data (output of the cleaning script) to just a handful of polygons (those that are ~30 degrees latitude) to prove that the viz workflow can process polygons that far south. Here they are in local cesium:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data available The complete dataset is on the datateam server and ready to process layer Displaying a specific data product in the PDG portal pdg Permafrost Discovery Gateway Permafrost: Subsurface Features data layer category: Permafrost: Subsurface Features
Projects
Status: In Progress
Development

No branches or pull requests

4 participants