-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing nested conjugated / transposed / scaled expressions #203
Comments
@mhoemmen wow you drafted this really fast! |
Great pseudocode! It would be great to have something that works for arbitrarily nested conjugated/transposed/scaled (if we don't limit the recursion depth). One quick question: don't we need to check if it's Personally, I'd prefer to resolve the effective |
@youyu3 wrote:
The body of the This design thus factors extracting information from
Absolutely, a good idea. That's what "[w]e can fix at least (2) by applying the recursive approach to each pair ( |
@fnrizzi wrote:
Awww : - ) All these discussions with y'all have been helpful! I wish I had figured this out sooner so you wouldn't have had to come up with those patterns on your own. |
Notes to implementers (e.g., @youyu3 ):
|
Note: the merged PR #238 fully implements People interested in this issue (e.g., @youyu3, @fnrizzi, @MikolajZuzek ) should note that |
Discussion on PR #197 and elsewhere (e.g., with @youyu3 ) shows that it's tricky to optimize expressions like
transposed(conjugated(scaled(alpha, A)))
that result from calling e.g.,matrix_product
. "Optimize" here means "deduce that we can call an optimized BLAS routine." Forlayout_left
A, the Fortran BLAS can handle this case directly, by settingTRANSA='C'
andALPHA=alpha
.It occurred to me that a "recursive" design could make this easier. I put "recursive" in quotes because it's based on function overloads; the calls to the function with the same name aren't actually recursive, because their arguments' types change on each nested call.
Here's some pseudocode:
For a C++14 - compatible implementation, one could use function overloads (partial specialization) instead of
if constexpr
.Here are some issues with the above approach.
mdspan
once per "recursion" level.We can fix at least (2) by applying the recursive approach to each pair
(A, A_data)
and(B, B_data)
. This will bound the function call depth for the fall-back case.Regarding (1), we can mitigate this by limiting the "recursion" depth for cases that the BLAS obviously can't handle. Also, taking the
mdspan
by value lets us move-construct the pointer, layout, and accessor at each level, so we can reduce cost for the (admittedly unusual) case where any of these are expensive to construct.The text was updated successfully, but these errors were encountered: