Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stars_proxy memory hog #708

Open
dazu89 opened this issue Sep 1, 2024 · 1 comment
Open

stars_proxy memory hog #708

dazu89 opened this issue Sep 1, 2024 · 1 comment

Comments

@dazu89
Copy link

dazu89 commented Sep 1, 2024

Intending to build a high-dimensional data cube from raster files in plain text ASCII grid format I read all files' meta data (file path and attributes) into a data frame (1), group by dimensions and concatenate files in each group into a stars_proxy (2) to then summarize/concantenate the stars_proxys into a higher dimensional star_proxy (3), similar to the process described in this post on StackExchange or this Github issue.

Upon loading the star_proxy via my_star_proxy |> st_as_stars() the memory usage ascends into 10s of GB even if only a couple of files with file size of 5-10 MB are read. The problem only occurs with files of the following format

ncols                   500
nrows                  500
xllcorner              6.5
yllcorner              -65.5
cellsize                 0.002
NODATA_value            -9.9990E+03
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03  0.5000E-02  1.5000E+02 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
.			.			.			.			.			.
.			.			.			.			.			 .
.			.			.			.			.			  .

whereas with standard data no such problem occurs and only a couple 100 MB are used.

library(stars)
library(profmem)
options(profmem.threshold = 1e6)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
rs_mem = read_stars(tif)
print(object.size(rs_mem), standard = "SI", units = 'auto')
r = read_stars(list(a = c(tif,tif), b = c(tif, tif)), proxy = TRUE)
(xx = st_redimension(r, along = list(foo = 1:4)))
(rr = c(xx, xx))
(rrr = st_redimension(rr, along = list(bar = as.Date(c("2001-01-01", "2002-01-01")))))
p <- profmem({
  test = rrr |> st_as_stars()
})
sum(p$bytes, na.rm=TRUE) / 1e6

I suspect, I should supply some options to the read_stars routine but so far have not good guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@dazu89 and others