Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transpiler] relax stensors' innermost dimension's alignment to reduce shared memory usage #131

Open
jiazhihao opened this issue Oct 30, 2024 · 0 comments
Labels
CUDA Transpiler Issues and features related to the CUDA transpiler of Mirage enhancement New feature or request

Comments

@jiazhihao
Copy link
Member

Currently, the transpiler requires 16-bytes alignment for the innermost dimension of all stensors:

void calc_tensor_strides(size_t strides[],
size_t &num_phy_elems,
int num_dims,
int const dims[],
int innermost_dim,
int datatype_size) {
// An order of dimensions. We layout elements according to this order
vector<int> dim_order = {innermost_dim};
for (int i = num_dims - 1; i >= 0; --i) {
if (i != innermost_dim) {
dim_order.push_back(i);
}
}
size_t alignment = std::max(16 / datatype_size, 1);
size_t cur_stride = 1;
bool encountered_non1_dim = false;
for (int dim_idx : dim_order) {
int cur_dim = dims[dim_idx];
strides[dim_idx] = cur_stride;
if (cur_dim != 1) {
if (!encountered_non1_dim) {
cur_stride *= round_to_multiple((size_t)cur_dim, alignment);
encountered_non1_dim = true;
} else {
cur_stride *= cur_dim;
}
}
}
if (cur_stride == 1) {
// There is only one element in the tensor, we need to pad it to 16B
cur_stride = alignment;
}
num_phy_elems = cur_stride;
}
. This may result in more shared memory usage than necessary.

We should relax this constraint and only enforce this alignment for operators involving cp.async, ldmatrix, and other instructions that require such alignment.

@jiazhihao jiazhihao added CUDA Transpiler Issues and features related to the CUDA transpiler of Mirage enhancement New feature or request labels Oct 30, 2024
@jiazhihao jiazhihao moved this to Todo in CUDA Transpiler Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Transpiler Issues and features related to the CUDA transpiler of Mirage enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

1 participant