Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CUDA driver APIs to avoid scheduling too large blocks #690

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

apaszke
Copy link
Collaborator

@apaszke apaszke commented Nov 15, 2021

We sometimes emit kernels that require lots of registers and cannot be
scheduled in 1024-sized blocks. This uses a CUDA driver API to query for
a good block size. In the future we might want to cache this number to
avoid any driver-related overheads.

We sometimes emit kernels that require lots of registers and cannot be
scheduled in 1024-sized blocks. This uses a CUDA driver API to query for
a good block size. In the future we might want to cache this number to
avoid any driver-related overheads.
@google-cla google-cla bot added the cla: yes label Nov 15, 2021
@apaszke apaszke added the kokoro:force-run Trigger for GPU CI label Nov 17, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Trigger for GPU CI label Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants