[Transpiler] relax stensors' innermost dimension's alignment to reduce shared memory usage #131
Labels
CUDA Transpiler
Issues and features related to the CUDA transpiler of Mirage
enhancement
New feature or request
Currently, the transpiler requires 16-bytes alignment for the innermost dimension of all
stensors
:mirage/src/transpiler/resolve_tensor_layout.cc
Lines 75 to 108 in 262b101
We should relax this constraint and only enforce this alignment for operators involving
cp.async
,ldmatrix
, and other instructions that require such alignment.The text was updated successfully, but these errors were encountered: