Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: for each PR, show delta register usage for GPU kernels #765

Open
BenWibking opened this issue Oct 1, 2024 · 1 comment
Open

CUDA: for each PR, show delta register usage for GPU kernels #765

BenWibking opened this issue Oct 1, 2024 · 1 comment
Labels
CI enhancement New feature or request github_actions Pull requests that update GitHub Actions code

Comments

@BenWibking
Copy link
Collaborator

BenWibking commented Oct 1, 2024

Describe the proposal
List the GPU kernels with changed register usage as a comment in each PR.

This can done by using the --ptxas-options=v compiler flag, then parsing the compiler output with sed or another text parser, e.g.:

ptxas info    : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info    : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
    72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 46 registers, 176 bytes cmem[0], 16 bytes cmem[14]

We can parse the above into a *.csv file that consists of two columns: kernel name and register usage:

searchkernel 46

and then diff it against a reference file computed using the current development branch.

Describe alternatives you've considered
Profile GPU performance directly for each PR. This is tricky to do on a GPU kernel-by-kernel basis.

Additional context
GPU performance for our code is exquisitely sensitive to register usage. Conversely, register pressure is a good predictor of performance for our code. This metric should tell us whether there are major performance regressions on GPU.

See also: https://stackoverflow.com/questions/12388207/interpreting-the-verbose-output-of-ptxas-part-i

@BenWibking BenWibking added CI enhancement New feature or request github_actions Pull requests that update GitHub Actions code labels Oct 1, 2024
@BenWibking BenWibking changed the title CUDA: show delta register usage for GPU kernels CUDA: for each PR, show delta register usage for GPU kernels Oct 1, 2024
@BenWibking
Copy link
Collaborator Author

Perl one-liner from Weiqun:

$ perl -ne 'if (/Compiling entry function (.*?) for '\''sm_/) { print "$1 "; } elsif (/Used (\d+) registers/) { print " $1\n"; }' foo.txt
'searchkernelxyz(octree, int*, double, int, double*, double*, double*)'  256
'searchkernel(octree, int*, double, int, double*, double*, double*)'  46

$ cat foo.txt
111

222
ptxas info    : Compiling entry function 'searchkernelxyz(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info    : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
    72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 256 registers, 176 bytes cmem[0], 16 bytes cmem[14]
sss
ddd
ptxas info    : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info    : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
    72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 46 registers, 176 bytes cmem[0], 16 bytes cmem[14]
xxxx
yyy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI enhancement New feature or request github_actions Pull requests that update GitHub Actions code
Projects
None yet
Development

No branches or pull requests

1 participant