Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parametrize runWriter by a monoid used for the reduction
This makes it possible to express (parallel) reductions over arbitrary monoids. Thanks to this, we can start removing some nasty hacks (like the one used for `Eq (n=>a)`) and make the (work-in-progress) FFT example parallel! Anyway, this whole change turned out to be surprisingly difficult, but thanks to many chats with @dougalm, I think that we've arrived at a particularly nice solution. The crux of the matter is the fact that Dex, unlike most other languages with some form of a built-in reduction operator, allows slicing the accumulator. This poses an interesting problem: if the user was to specify the `Monoid` instance for the full accumulator (e.g. a matrix), then what monoid are we supposed to use for its slice?! As it turns out, this might not even be well defined! For example, the type of square matrices with identity matrix and matrix multiplication forms a monoid, but there is no natural "sub-monoid" we could use in an expression of the form `ref!i += ...`. So, unless we're ok with giving up reference slicing (which we know we want for sure, since this is a way to express e.g. parallel scatters and histograms), we have to come up with a way of constructing those sub-monoids. And here, and answer is to turn the problem around: instead of asking the users to provide us the monoids for the full references, we expect the monoid to refer to some _base type_ (and we call it a _base monoid_). That is, when the `Accum` reference is of type `n=>m=>...=>k=>a`, then any of `m=>...=>k=>a`, ..., `k=>a` and even `a` are considered base types. While this is a bit surprising at first, it turns out to actually be quite convenient, since it does seem more straightforward to say "I want this to be a reduction over `(Float, 0.0, +)`" instead of mentioning the full table type, a broadcast version of `0.0` and a pointwise-lifted version of `+`. Finally, because many data types have multiple valid monoids (`Float` has at least four: `+`, `*`, `min`, `max`), the monoid argument is explicit and those instances can be obtained via the `named-instance` syntax added in the previous commits. Note that I've also included some helper functions which make it possible to synthesize `Monoid` instances automatically from `Add` and `Mul` instance for any given type (see `AddMonoid` and `MulMonoid`). I haven't been fully able to verify the correctness of the parallelization change, because the CUDA backend seems to be broken anyway (sigh...), but the code it generates looks ok.
- Loading branch information