-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error reading some raster files using mos.read. Size issue? #550
Comments
Actually, looking at the release notes, maybe the change did not make it into 0.4? https://github.com/databrickslabs/mosaic/releases/tag/v_0.4.1 |
@milos-colic will have an authoritative answer here, but I think you'll need to use the 'retile_on_read' strategy for reading large rasters since there's no way around the 2GB limit on each row object in Spark. raster_df = (
spark.read
.format("gdal")
.option("raster.read.strategy", "retile_on_read") # sets the reader strategy
.option("sizeInMB", "42") # sets the upper bound for size of raster in each row in the output dataframe
.load("/path/to/file")
) |
I didn't realise this was available in the options. I'll try it out and get back to you. I think it would be good to explicitly call this out in the documentation by the way? Thanks. |
Agreed. Hope it helps you make progress. |
Hi @sllynn . No luck unfortunately. I'm just trying to turn a raster into a H3 table. This is my code:
Error is below: Could it be because my raster is in CRS 54009 rather than WGS84? The file is available here if you/anyone wants to try to debug: I will try to complete the process using a WGS84 version of the file in the meantime ... |
Just wanted to add that the 'retile on read' option does work. It was the next stage of my code (converting to H3) that is causing the crash. I should add that the retile on read is very slow. I find myself wondering why it is physically re-writing our smaller files. Why not just leverage VRTs? |
We got 0.4.2 out, but it didn't include the raster_to_grid and similar work involving tessellate performance. We had to streamline it due to a dependency issue that arose from latest geopandas, see docs. So, 0.4.3 coming soon with more "in-flight" work. |
Hello.
Second file (87mb) works. First (7.9GB) does not.
I recall there was an issue with reading files larger than 2GB, but I thought that this had been resolved with Mosaic 0.4. So is it something else?
The text was updated successfully, but these errors were encountered: