Breaking: Faster/better `aggregate` #763

rafaqz · 2024-09-25T20:14:04Z

aggregate was written so long ago, was never optimized, and a bunch of things never made sense.

This PR adds optional threading, general performance improvements, and specific fast paths for common methods like sum and mean. For a RasterStack this can be an order of magnitude or more improvement overall.

At the same time I fixed the dumb dissaggregate arguments.

@tiemvanderdeure this was inspired by your comment to use aggregate proving to be actually slower than resample. So if you want to review...

tiemvanderdeure · 2024-09-26T12:13:24Z

src/methods/aggregate.jl

+    checkbounds(src, u...)
+    # If a disk array, cache the src so we don't read too many times
+    src_parent = isdisk(src) ? DiskArrays.cache(parent(src)) : parent(src)
+    @inbounds broadcast!(dst, CartesianIndices(dst)) do I


would it be worthwhile to optionally run this threaded?

Yeah it could be a threaded for loop i guess

Oh the reason not to is currently it works on GPU and DiskArrays as is, threading will break that

well if threaded is false by default then you could either broadcast or do a threaded for loop. But after seeing just how fast this is I don't think that would only be relevant in some niche use cases

tiemvanderdeure · 2024-09-26T12:14:32Z

src/methods/aggregate.jl

+$FILENAME_KEYWORD
+$SUFFIX_KEYWORD
+$PROGRESS_KEYWORD
+$THREADED_KEYWORD

 Note: currently it is faster to aggregate over memory-backed arrays.


should this say disaggregate?

It can be deleted, the caching should make it fast now

Actually no cache is broken on our DiskArrays version 😭

tiemvanderdeure · 2024-09-26T13:03:14Z

src/methods/aggregate.jl

+const SKIPMISSING_KEYWORD = """
+- `skipmissing`: if `true`, any `missingval` will be skipped during aggregation, so that
+    only areas of all missing values will be aggregated to `missingval`. If `false`, any
+    aggregated area containing a `missingval` will be assigned `missingval`.
+"""


add "false by default"

tiemvanderdeure · 2024-09-26T13:57:12Z

src/methods/aggregate.jl

    disaggregate!((locus,), dst, src, scale)
 end
-function disaggregate!(loci::Tuple{Locus,Vararg}, dst::AbstractRaster, src, scale)
+function disaggregate!(dst::AbstractRaster, src, scale)
    intscale = _scale2int(DisAg(), dims(src), scale)
    broadcast!(dst, CartesianIndices(dst)) do I


It must be faster to loop through the src instead like so since we know it is smaller.

By looping through dst we have lots of unnecessary calls to upsample and lookups in src.

Suggested change

broadcast!(dst, CartesianIndices(dst)) do I

for I in CartesianIndices(src)

upper = upsample.(Tuple(I), intscale)

lower = upper .+ intscale .- 1

I_dst = map(:, upper, lower)

val = src[I]

val === missingval(src) ? missingval(dst) : val

view(dst, I_dst...) .= val

end

return dst

Can't comment on unchanged lines, but if we implement this then downsample isn't used anywhere anymore.

Right yes it will be faster. I think the hidden reason again is that wont work on a GPU or a disk array and I was trying to be generic

We could use Flatten.jl so see if there is an array inside whatever wrappers, and if it is do the loop, if not the broadcast

We need DiskArraysKernelAbstractions then we could just do all of this with KernelAbstractions, and use Stencils.jl too to make it even faster

What about this woulnd't work with on GPU? We're just viewing into an array and filling it with some value, right?

~~Its all scalar indexing~~. We need to launch single kernel to do the whole lot in one go.

Well its lots of little views at least. But looking at it I don't think the original will work on GPU either - but it can work on DiskArrays and its fast now since we have cache

broacasts handle kernel launches for us. If you want to write manual code like that with blocks we need KernelAbstractionsjl.

(and actually for this to work on GPU we need KernelAbstractions. But it works on disk...)

Another idea is to do some clever reshapes and permutes to be able to broadcast without having to do indexing math ourselves. I tried this implementation and it works, but it's actually a little slower than the original. I guess the reshaping makes indexing that much slower.

intscale = _scale2int(DisAg(), dims(src), scale) n = length(intscale) reshape_dims = ntuple(i -> mod(i, 2) == 0 ? size(src)[(i ÷ 2)] : intscale[(i ÷ 2) + 1], Val(n*2)) permute_order = (range(2; step = 2, length = n)..., range(1, step = 2, length = n)...) dst_reshaped = Base.PermutedDimsArray(Base.reshape(dst, reshape_dims), permute_order) broadcast!(dst_reshaped, parent(src)) do val val === missingval(src) ? missingval(dst) : val end return dst

tiemvanderdeure · 2024-09-26T14:02:53Z

Amazing! Did a few quick tests/profviews and it seems blazing fast - I can't come up with a way to improve there. I just had a quick thing for disaggregate.

A few other suggestions (short or long term) - now that we're making considerable changes to this function anyway:

would it make sense to implement aggregate for RasterSeries?
would it make sense to only aggregate along contiuous dimensions by default. So that a raster like Raster(rand(X(1:10), Band(["a", "b"]) won't get aggregated along the Band dimension unless requested explicitly?
Would we want an interpolate function that works like disaggregate but does more than just copy values over. I guess this was the idea with the function argument in disaggregate in the first place?

src/methods/aggregate.jl

rafaqz · 2024-09-26T14:15:53Z

would it make sense to implement aggregate for RasterSeries?

Its there! I also threaded it

would it make sense to only aggregate along contiuous dimensions by default. So that a raster like Raster(rand(X(1:10), Band(["a", "b"]) won't get aggregated along the Band dimension unless requested explicitly?

You mean AbstractSampled and NoLookup lookups are aggregated but Categorical are not? Yes I thought about that too looking at the R code doing that in the benchmarks. We basically never want to aggegate adjacent categories. You would use groupby for something like that.

Would we want an interpolate function that works like disaggregate but does more than just copy values over. I guess this was the idea with the function argument in disaggregate in the first place?

Yeah it was going to be but never happened lol. Probably it should be another geostatistics gap fillin method instead of here.

tiemvanderdeure · 2024-09-26T14:52:56Z

You mean AbstractSampled and NoLookup lookups are aggregated but Categorical are not?

Yes I think that could be the default if scale is an integer. And to override it users can always provide a tuple and specify.

It could also depend on the order - so we don't aggregate along Unordered dimensions.

rafaqz · 2024-09-27T14:44:50Z

Ok I've implemented this so you have to explicitly aggregate categorical dimensions. I'm not sure what to do with the categories? Maybe we need to join as strings somehow? Or return NoLookup ?

With Unordered lookups we really can't aggregate them in a sensible way - I guess if people explicitly force it we can just return NoLookup too, to show that the lookups are effectively destroyed by the process?

Edit: I think we need an @info about which dimensions got aggregated so its not mysterious, and let verbose=false turn it off if it annoys anyone.

rafaqz · 2024-10-09T12:10:53Z

@tiemvanderdeure want to do one last review of this? would be good to get it in with the other breaking changes

tiemvanderdeure · 2024-10-09T12:17:31Z

I think there's a commit missing here. The behaviour with unordered or missing lookups is not implemented or tested

tiemvanderdeure · 2024-10-09T12:23:26Z

But you said this might be on your old laptop, right?

rafaqz · 2024-10-09T12:30:08Z

Oh no yes it probably is. I'll have to put the NVME in some other computer

tiemvanderdeure · 2024-12-10T09:55:23Z

Bump! Did you get your data back from the broken laptop?

rafaqz · 2024-12-10T09:57:12Z

Ugh not yet, but I have it on an external drive at least. Will do it soon

rafaqz added 8 commits September 25, 2024 22:05

optimise aggregate

b29f534

specify DiskArrays.cache

e635688

cleanup

c55894c

Merge branch 'main' into faster_aggregate

1eb9638

bugfix

ddf0b08

bugfix

33f7553

bugfix

1608895

tets more syntax

c037e7b

tiemvanderdeure approved these changes Sep 26, 2024

View reviewed changes

tiemvanderdeure reviewed Sep 26, 2024

View reviewed changes

src/methods/aggregate.jl Show resolved Hide resolved

rafaqz changed the title ~~Faster aggregate~~ Breaking: Faster aggregate Sep 27, 2024

rafaqz changed the title ~~Breaking: Faster aggregate~~ Breaking: Faster/better aggregate Sep 27, 2024

rafaqz changed the title ~~Breaking: Faster/better aggregate~~ Breaking: Faster/better aggregate Sep 27, 2024

rafaqz added 2 commits September 28, 2024 17:15

rework the meaning of Colon in aggregation

46c0233

bugfix

c6c6384

rafaqz added the breaking label Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking: Faster/better `aggregate` #763

Breaking: Faster/better `aggregate` #763

rafaqz commented Sep 25, 2024 •

edited

Loading

tiemvanderdeure Sep 26, 2024

rafaqz Sep 26, 2024

rafaqz Sep 26, 2024

tiemvanderdeure Sep 27, 2024

tiemvanderdeure Sep 26, 2024

rafaqz Sep 26, 2024

rafaqz Sep 27, 2024

tiemvanderdeure Sep 26, 2024

tiemvanderdeure Sep 26, 2024

tiemvanderdeure Sep 26, 2024

rafaqz Sep 26, 2024

rafaqz Sep 26, 2024

rafaqz Sep 26, 2024

tiemvanderdeure Sep 26, 2024

rafaqz Sep 26, 2024 •

edited

Loading

rafaqz Sep 26, 2024 •

edited

Loading

tiemvanderdeure Sep 27, 2024

tiemvanderdeure commented Sep 26, 2024

rafaqz commented Sep 26, 2024 •

edited

Loading

tiemvanderdeure commented Sep 26, 2024

rafaqz commented Sep 27, 2024 •

edited

Loading

rafaqz commented Oct 9, 2024

tiemvanderdeure commented Oct 9, 2024

tiemvanderdeure commented Oct 9, 2024

rafaqz commented Oct 9, 2024

tiemvanderdeure commented Dec 10, 2024

rafaqz commented Dec 10, 2024

-    broadcast!(dst, CartesianIndices(dst)) do I
+    for I in CartesianIndices(src)
+        upper = upsample.(Tuple(I), intscale)
+        lower = upper .+ intscale .- 1
+        I_dst = map(:, upper, lower)
+        val = src[I]
+        val === missingval(src) ? missingval(dst) : val
+        view(dst, I_dst...) .= val
+    end
+    return dst

Breaking: Faster/better aggregate #763

Are you sure you want to change the base?

Breaking: Faster/better aggregate #763

Conversation

rafaqz commented Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

rafaqz Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiemvanderdeure commented Sep 26, 2024

rafaqz commented Sep 26, 2024 • edited Loading

tiemvanderdeure commented Sep 26, 2024

rafaqz commented Sep 27, 2024 • edited Loading

rafaqz commented Oct 9, 2024

tiemvanderdeure commented Oct 9, 2024

tiemvanderdeure commented Oct 9, 2024

rafaqz commented Oct 9, 2024

tiemvanderdeure commented Dec 10, 2024

rafaqz commented Dec 10, 2024

Breaking: Faster/better `aggregate` #763

Breaking: Faster/better `aggregate` #763

rafaqz commented Sep 25, 2024 •

edited

Loading

rafaqz Sep 26, 2024 •

edited

Loading

rafaqz Sep 26, 2024 •

edited

Loading

rafaqz commented Sep 26, 2024 •

edited

Loading

rafaqz commented Sep 27, 2024 •

edited

Loading