You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As soon as the collective instance branch is ready for testing, we need to move to it and introduce new concepts in the lowerer and mapper to correctly handle creating them. This is a three phase process.
Simple "replicated tensor" computations, and remove all of the hard-coded manual replication things. Codes that come to mind are
SpMV weak scale
TTV
TTMC
MTTKRP
More complicated launch patterns where subsets of launches need pieces of tensors
Johnson's Algorithm
COSMA
2D matrix computations that do lock-step broadcast communcations, such as SUMMA. This step will be the hardest, as it requires changing alot of code. The problem with the current approach is that it does 1 2D launch, and then each launched sub-task launches a bunch more tasks. It's likely that we will need to convert this into a 3D launch with a projection functor that understands the ordering between tasks (also generated by DISTAL), and then chooses collectives to use for each row/column.
PUMMA
SUMMA
2.5D MatMul
The text was updated successfully, but these errors were encountered:
It's possible that for the third case, we can do something with the outer partitioning, rather than adjusting the launch space. We can just move the k loop to the outside, and do launches of index space tasks over the machine that use collectives!
As soon as the collective instance branch is ready for testing, we need to move to it and introduce new concepts in the lowerer and mapper to correctly handle creating them. This is a three phase process.
The text was updated successfully, but these errors were encountered: