Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely low running time when profiling transpiled muGraphs #97

Open
wmdi opened this issue Oct 4, 2024 · 1 comment
Open

Extremely low running time when profiling transpiled muGraphs #97

wmdi opened this issue Oct 4, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@wmdi
Copy link
Collaborator

wmdi commented Oct 4, 2024

When profiling transpiled muGrpahs, some results are extremely low and are close to kernel launch time. For example, in the gated_mlp example, some muGraphs only consume ~0.004ms in the catalyst cluster. This is an indication that we may have kernel launch error in the generated cuda programs.

@jiazhihao jiazhihao added the bug Something isn't working label Oct 15, 2024
@jiazhihao
Copy link
Member

I suspect this is because some of the generated CUDA kernels cannot be successfully executed on A5000 GPUs. One possibility is that the required smem size exceeds the hardware limit. @xinhaoc Can you take a look at the gated_mlp issue on A5000?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants