Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Can I perform operations on the geocube pixels during rasterization? Count, Mean, etc. #28

Open
bw4sz opened this issue Jun 18, 2020 · 11 comments
Labels
documentation Documentation related issue good first issue Good for newcomers question Further information is requested

Comments

@bw4sz
Copy link

bw4sz commented Jun 18, 2020

Hi this is my first foray into geocube. Thanks for your work here, clearly a good idea.

I'd like to add an example usage that I believe should be possible, but not just quite getting right.

Is it possible to count the number of polygons in a cell, or perform operations on the geocube pixels during rasterization? Imagine a geopandas dataframe with many polygons. For each cell, perform some numpy operation (like rasterstats does) on the incoming polygons. I can perform pandas like operation after creating a geocube, but it doesn't preserve the spatial object. I feel like the group_by argument is in this direction, but could not succeed.

Something like

#Turn each set of predictions into a raster
import geopandas as gpd
from shapely.geometry import Polygon
from geocube.api.core import make_geocube

#a numeric column to count
g["mask"] = 1

cube = make_geocube(vector_data=g, resolution=(-50, 50), function=mask.count())

To create a heatmap of number of polygons in each cell.

Here is a notebook to help illustrate use case (sorry the geocube svg doesn't play well with ipython on github).

https://github.com/weecology/NEON_crown_maps/blob/master/notebooks/Create%20a%20raster%20of%20polygon%20counts.ipynb

@bw4sz bw4sz added the proposal Idea for a new feature. label Jun 18, 2020
@bw4sz
Copy link
Author

bw4sz commented Jun 18, 2020

I think it would go right here

vector_data = vector_data.groupby(group_by)

@snowman2
Copy link
Member

snowman2 commented Jun 18, 2020

Should probably add better documentation around this, but you can pass in a custom rasterization function (examle in docs):

from functools import partial

from rasterio.enums import MergeAlg
from geocube.rasterize import rasterize_image
from geocube.api.core import make_geocube

cube = make_geocube(
...
    rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add),
    fill=0,
)

With this method, each cell should contain the count of the number of times a polygon overlapped the cell.

@bw4sz
Copy link
Author

bw4sz commented Jun 18, 2020 via email

@bw4sz
Copy link
Author

bw4sz commented Jun 18, 2020

I'm still debugging, i'll update when I fix.

from functools import partial
from rasterio.enums import MergeAlg
from geocube.rasterize import rasterize_image

g =
gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100),
 measurements=['mask'], rasterize_function=partial(rasterize_image,
merge_alg=MergeAlg.add))
cube.plot()

isn't quite right, the cube is blank

cube
<xarray.Dataset>
Dimensions:      (x: 10, y: 10)
Coordinates:
  * y            (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
  * x            (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
    spatial_ref  int64 0
Data variables:
    left         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    bottom       (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    right        (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    top          (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    height       (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    area         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
    mask         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan
nan
Attributes:
    grid_mapping:  spatial_ref

cube.plot()
Traceback (most recent call last):
  Python Shell, prompt 30, line 1
  File
"/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py",
line 162, in __call__
    "Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an
explicit plot method, e.g. ds.plot.scatter(...)

or specifying measurements=['mask']

cube.mask
<xarray.DataArray 'mask' (y: 10, x: 10)>
array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
Coordinates:
  * y            (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
  * x            (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
    spatial_ref  int64 0
Attributes:
    name:          mask
    long_name:     mask
    grid_mapping:  spatial_ref
    _FillValue:    nan

@snowman2
Copy link
Member

Try using fill=0

@bw4sz
Copy link
Author

bw4sz commented Jun 19, 2020

I wasn't sure where to place, but tried both, no change.

g = gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100), measurements=['mask'] , rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, fill=0))
cube.plot()
cube.rio.to_raster("/Users/Ben/Desktop/data1.tif")
Traceback (most recent call last):
  Python Shell, prompt 41, line 1
  File "/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py", line 162, in __call__
    "Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an explicit plot method, e.g. ds.plot.scatter(...)
cube
<xarray.Dataset>
Dimensions:      (x: 10, y: 10)
Coordinates:
  * y            (y) float64 4.881e+06 4.881e+06 ... 4.882e+06 4.882e+06
  * x            (x) float64 3.200e+05 3.202e+05 ... 3.208e+05 3.21e+05
    spatial_ref  int64 0
Data variables:
    mask         (y, x) float64 nan nan nan nan nan nan ... nan nan nan nan nan
Attributes:
    grid_mapping:  spatial_ref
g = gpd.read_file("/Users/ben/Dropbox/Weecology/Crowns/examples/2019_BART_5_320000_4881000_image.shp")
g["mask"]=1
cube = make_geocube(vector_data=g, resolution=(100, 100), measurements=['mask'] , fill=0, rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add))
cube.plot()
cube

/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/datacube/utils/geometry/_base.py:301: DeprecationWarning: Please use `str(crs)` instead of `crs.crs_str`
  warnings.warn("Please use `str(crs)` instead of `crs.crs_str`", category=DeprecationWarning)
Traceback (most recent call last):
  Python Shell, prompt 43, line 1
  File "/Users/ben/miniconda3/envs/crowns/lib/python3.7/site-packages/xarray/plot/dataset_plot.py", line 162, in __call__
    "Dataset.plot cannot be called directly. Use "
builtins.ValueError: Dataset.plot cannot be called directly. Use an explicit plot method, e.g. ds.plot.scatter(...)

here is the data link if its useful. I also tried removing the measurements arg. No change. Not 100%, but just using the plot error as an indicator of success.

https://www.dropbox.com/s/jwxlldua58n82du/2018_BART_4_320000_4881000_image.shp?dl=0

@snowman2
Copy link
Member

The second one is correct. I suggest doing cube.mask.plot()

@bw4sz
Copy link
Author

bw4sz commented Jun 19, 2020 via email

@snowman2 snowman2 added question Further information is requested and removed proposal Idea for a new feature. labels Dec 14, 2020
@snowman2 snowman2 added the documentation Documentation related issue label Jun 14, 2022
@snowman2 snowman2 changed the title Can I perform operations on the geocube pixels during rasterization? Count, Mean, etc. DOC: Can I perform operations on the geocube pixels during rasterization? Count, Mean, etc. Jun 14, 2022
@snowman2 snowman2 added the good first issue Good for newcomers label Jun 14, 2022
@davidbrochart
Copy link

I was trying to do something similar, with the polygons of all cities in the world, and I would expect to get a count of cities in each cell, but it looks like I only get 0/1 values. Here is an example notebook:

heatmap.ipynb.gz

@davidbrochart
Copy link

Fixed by passing all_touched=True:

rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True)

@raybellwaves
Copy link
Contributor

Thanks @snowman2 for your work here and others for asking questions. This helped me this evening.

Not to myself to make a PR tomorrow to add "Examples - count points in grid cells" to https://corteva.github.io/geocube/stable/examples/examples.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation related issue good first issue Good for newcomers question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants