Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line and Polygon extract optimisation #824

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Line and Polygon extract optimisation #824

wants to merge 12 commits into from

Conversation

rafaqz
Copy link
Owner

@rafaqz rafaqz commented Dec 1, 2024

This PR adds a number of optimizations to make line and polygon extract faster.

  • threading (points too)
  • direct extraction of lines during line burning (refactors burning a little to facilitate)
  • memory optimisation
  • optimisation of extract from RasterStack in combination with fixes in DD

Should end up an order of magnitude better for rasters with threading, and two orders for larger stacks that had type instability.

This will need the latest DD to pass.

Also adds a few more keywords like flatten that allows you to opt-out of flattening extracted points to a single vector with flatten=false.

@tiemvanderdeure if you have time to have a look that would help, also the optimisations may be interesting!

(Proably will update DD properly elsewhere, but its needed here)

@tiemvanderdeure
Copy link
Contributor

Blazing fast and so clean! Don't understand all of the code yet, but I tried a little example just to profile and look at the code. It does seem like it's still allocating somewhere.

using Rasters, GeoInterface, Rasters.Lookups
ras = rand(X(Sampled(1.0:90.0, sampling = Intervals(End()))), Y(Sampled(1.0:90.0, sampling = Intervals(End())))) |> Raster
linestrings = [GeoInterface.LineString([(rand(1.0:90.0), rand(1.0:90.0)), (rand(1.0:90), rand(1:90.0))]) for _ in 1:20_000]
@time extract(ras, linestrings);
  0.104471 seconds (140.07 k allocations: 21.073 MiB, 5.77% gc time)

20 thousand linestrings extracted in 0.1 second is still really really good

@tiemvanderdeure
Copy link
Contributor

In the example above @code_warntype shows the return type is a union - I think because of flatten.

Another minor comment is that I think progress could be false by default. For 99% of users extracting will just take a fraction of a second.

@rafaqz
Copy link
Owner Author

rafaqz commented Dec 9, 2024

I think the return types will be a union because the options change it? The top level isn't type stable.

And the progress won't show at all if it's fast enough? I put it by default as you only usually know after you try something and see how slow it is.

For the allocations it has to allocate all the vectors as it goes once it knows the Manhattan line distance of each line. But it does seem like a few too many allocations still. It can be at minimum ~20k in this example.

My next idea is to use this line burning in zonal and do online stats or just one workspace vector per thread where online is not possible. That will be even faster.

@tiemvanderdeure
Copy link
Contributor

I think the return types will be a union because the options change it?

You're right.

And the progress won't show at all if it's fast enough?

You're right again. The exact example I was trying was giving me a
Extracting geometries... 100%|██████████████████████████████████████████████████| Time: 0:00:00
of 0 seconds all the time. But if it's a bit faster it won't show and if it's a bit slower i makes sense.

I understand each linestring will need its own vector allocation even if it's flattened after. Still seems there is a little too much red here for this to be quite right?

billede

@rafaqz
Copy link
Owner Author

rafaqz commented Dec 9, 2024

Yeah it seems a bit much. There is some allocation because the union is too complicated and indexing needs a convert. We can maybe be smarter about it somehow but the remaining gains are pretty small now.

@tiemvanderdeure
Copy link
Contributor

the remaining gains are pretty small now

From here on out it's just perfectionism :)

@rafaqz
Copy link
Owner Author

rafaqz commented Dec 10, 2024

"Profile golf"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants