Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow recommendations for simple spatial mean computations #687

Open
richardbeare opened this issue May 27, 2024 · 5 comments
Open

workflow recommendations for simple spatial mean computations #687

richardbeare opened this issue May 27, 2024 · 5 comments

Comments

@richardbeare
Copy link

Hi,
I have geotif data, where each tile is 109407 x 69515. The data is simple, single band. I want to compute means over polygons (statistical areas from another source). Some polygons overlap tile boundaries, but I'll deal with that later.

I was hoping I could use st_apply to load and process a single row or column of data at a time - i.e counting the number of non missing pixels (i.e. those in the polygon) and their sum, allowing the overall mean to be computed by combining the line data.

However, I don't seem to be able to do any polygon related operations without triggering a load operation that exhausts memory. Subsampling the data will almost certainly be good enough for this project, but I'm obsessing about getting an efficient, exact solution.

The following appears to work in a reasonable time and memory footprint

tile_proxy <- read_stars("the.tif", proxy=TRUE)
## tt is the polygon we're interested in
j1 <- st_crop(tile_proxy, st_bbox(tt))

j2 <- st_as_stars(st_apply(j1, MARGIN=2, mean, na.rm=TRUE))

but as soon as I include a polygon-based indexing operation, RAM usage goes crazy:

j3 <- st_as_stars(st_apply(j1[tt], MARGIN=2, mean, na.rm=TRUE))

Any suggestions?

@edzer
Copy link
Member

edzer commented May 27, 2024

Have you tried aggregate?, as in

aggregate(tile_proxy, tt, mean, na.rm = TRUE)

@kadyb
Copy link
Contributor

kadyb commented May 27, 2024

Maybe it will be even better to use the exact = TRUE argument to engage exactextractr (it is very fast for summarizing values)?

Edit: I checked the source of the aggregate() and basically exact_extract() is not used, so you need to test {terra} as backend.

@richardbeare
Copy link
Author

No luck with aggregate - similar behaviour to st_apply - slowly consumes RAM/virtual memory until crashing.

I'll look into terra and exact_extract

@richardbeare
Copy link
Author

exactextractr::exact_extract works well, crunching through the test set in under a minute while using minimal RAM

library(exactextractr)
a <- raster::raster(tifname)
g <- exact_extract(a, tt, "mean")

# should work with either terra or raster loaders
a <- terra::rast(tifname)
g <- exact_extract(a, tt, "mean")

terra::extract had the same ram problem as the stars approach:

a <- terra::rast(tifname)
g <- terra::extract(a, tt, mean, na.rm = TRUE)

raster::extract didn't appear to have a RAM problem, but took a really long time - I gave up after over an hour.

a <- raster::raster(tifname)
g <- raster::extract(a, tt, mean, na.rm = TRUE)

@kadyb
Copy link
Contributor

kadyb commented May 28, 2024

If you are interested in the highest efficiency, I think the best option will be to use exact_extract() with SpatRaster, it should be faster than for RasterLayer. See these benchmarks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants