Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform dataset coordinates into specified crs #97

Merged
merged 9 commits into from
Dec 21, 2023
16 changes: 14 additions & 2 deletions xee/ext.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,10 @@ def __init__(
else:
self.mask_value = mask_value

self.coordinate_transformer = pyproj.Transformer.from_crs(
'EPSG:4326', self.crs, always_xy=True
)

@functools.cached_property
def get_info(self) -> Dict[str, Any]:
"""Make all getInfo() calls to EE at once."""
Expand Down Expand Up @@ -555,11 +559,19 @@ def get_variables(self) -> utils.Frozen[str, xarray.Variable]:
lon_grid = self.project((0, 0, v0.shape[1], 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @boothmanrylan we figured out that the issue is with pixelLonLat() actually as it gives the coordinates always into the degree so instead of that if you can use this lnglat_img = ee.Image.pixelCoordinates(ee.Projection(self.crs_arg)) then it gives the coordinates into given crs format.

Also change the bandIds from longitude to x into at line 558.
-> Same with latitude to y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lat_grid = self.project((0, 0, 1, v0.shape[2]))
lon = self.image_to_array(
lnglat_img, grid=lon_grid, dtype=np.float32, bandIds=['longitude']
lnglat_img,
grid=lon_grid,
dtype=np.float32,
bandIds=['longitude', 'latitude'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you're including latitude in here?

Same with lat = self.image_to_array( (except reversed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pyproj.Transformer.transform requires both the x and y coordinates be given.

If my understanding of lon_grid and lat_grid is correct, then the original calls to self.image_to_array return the longitude of the topmost and latitude of the leftmost edges of the bounds respectively. If we give those as the x and y inputs to the transformer, what we get back is the transformed coordinates of the diagonal of the bounds which differs from transforming the coordinates of the leftmost and topmost edges separately which is what I'm doing and why latitude is included in the image_to_array call to get the longitude and vice versa.

Looking at this again, I think that using multidimensional coordinates may be a better solution because the difference between the two methods occurs because the x coordinates vary along the y axis and the y coordinates vary along the x axis...

Open to input on that though.

Here is a notebook that shows the difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and looked at this again and I believe the way I'm doing it is correct.

When I open a dataset -> grab the first image -> write to a geotiff using rioxarray -> upload back to earthengine -> compare against the original first image in the collection I get an exact match. See this notebook

Additionally, extracting the top edge and left edge without grabbing both the longitude and latitude and transforming those together will fail if the region is not perfectly square because pyproj.Transformer.transform needs the x inputs and y inputs to have the same length.

)
lon = self.coordinate_transformer.transform(lon[0], lon[1])[0]
lat = self.image_to_array(
lnglat_img, grid=lat_grid, dtype=np.float32, bandIds=['latitude']
lnglat_img,
grid=lat_grid,
dtype=np.float32,
bandIds=['longitude', 'latitude'],
)
lat = self.coordinate_transformer.transform(lat[0], lat[1])[1]
width_coord = np.squeeze(lon)
height_coord = np.squeeze(lat)

Expand Down
75 changes: 75 additions & 0 deletions xee/ext_integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@
import numpy as np
import xarray as xr
from xarray.core import indexing
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple import suggestions:

  • os has already been imported. Please remove.
  • tempfile is a built-in Python module. Please move it up below pathlib.
  • rioxarray is unused. Please remove.
  • rasterio is not defined as a test dependency. Please include it in project.optional-dependencies tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do for os tempfile and rasterio

However, from the rioxarray docs:

"rioxarray extends xarray with the rio accessor. The rio accessor is activated by importing rioxarray like so: import rioxarray"

so without that import, ds.rio.to_raster(...) etc. fail with:

AttributeError: 'Dataset' object has no attribute 'rio'

I can add rioxarray as a test dependency along with rasterio though.

import rioxarray
import rasterio
import tempfile
import xee

import ee
Expand Down Expand Up @@ -397,6 +401,77 @@ def test_validate_band_attrs(self):
for _, value in variable.attrs.items():
self.assertIsInstance(value, valid_types)

def test_write_projected_dataset_to_raster(self):
# ensure that a projected dataset written to a raster intersects with the
# point used to create the initial image collection
with tempfile.TemporaryDirectory() as temp_dir:
temp_file = os.path.join(temp_dir, 'test.tif')

crs = 'epsg:32610'
proj = ee.Projection(crs)
point = ee.Geometry.Point([-122.44, 37.78])
geom = point.buffer(1024).bounds()

col = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
col = col.filterBounds(point)
col = col.filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', 5))
col = col.limit(10)

ds = xr.open_dataset(
col,
engine=xee.EarthEngineBackendEntrypoint,
scale=10,
crs=crs,
geometry=geom,
)

ds = ds.isel(time=0).transpose('Y', 'X')
ds.rio.set_spatial_dims(x_dim='X', y_dim='Y', inplace=True)
ds.rio.write_crs(crs, inplace=True)
ds.rio.reproject(crs, inplace=True)
ds.rio.to_raster(temp_file)

with rasterio.open(temp_file) as raster:
# see https://gis.stackexchange.com/a/407755 for evenOdd explanation
bbox = ee.Geometry.Rectangle(raster.bounds, proj=proj, evenOdd=False)
intersects = bbox.intersects(point, 1, proj=proj)
self.assertTrue(intersects.getInfo())

def test_write_dataset_to_raster(self):
# ensure that a dataset written to a raster intersects with the point used
# to create the initial image collection
with tempfile.TemporaryDirectory() as temp_dir:
temp_file = os.path.join(temp_dir, 'test.tif')

crs = 'EPSG:4326'
proj = ee.Projection(crs)
point = ee.Geometry.Point([-122.44, 37.78])
geom = point.buffer(1024).bounds()

col = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
col = col.filterBounds(point)
col = col.filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', 5))
col = col.limit(10)

ds = xr.open_dataset(
col,
engine=xee.EarthEngineBackendEntrypoint,
scale=0.0025,
geometry=geom,
)

ds = ds.isel(time=0).transpose('lat', 'lon')
ds.rio.set_spatial_dims(x_dim='lon', y_dim='lat', inplace=True)
ds.rio.write_crs(crs, inplace=True)
ds.rio.reproject(crs, inplace=True)
ds.rio.to_raster(temp_file)

with rasterio.open(temp_file) as raster:
# see https://gis.stackexchange.com/a/407755 for evenOdd explanation
bbox = ee.Geometry.Rectangle(raster.bounds, proj=proj, evenOdd=False)
intersects = bbox.intersects(point, 1, proj=proj)
self.assertTrue(intersects.getInfo())


if __name__ == '__main__':
absltest.main()
Loading