Use CUDA driver APIs to avoid scheduling too large blocks #690

apaszke · 2021-11-15T12:50:50Z

We sometimes emit kernels that require lots of registers and cannot be
scheduled in 1024-sized blocks. This uses a CUDA driver API to query for
a good block size. In the future we might want to cache this number to
avoid any driver-related overheads.

We sometimes emit kernels that require lots of registers and cannot be scheduled in 1024-sized blocks. This uses a CUDA driver API to query for a good block size. In the future we might want to cache this number to avoid any driver-related overheads.

google-cla bot added the cla: yes label Nov 15, 2021

apaszke added the kokoro:force-run Trigger for GPU CI label Nov 17, 2021

kokoro-team removed the kokoro:force-run Trigger for GPU CI label Nov 17, 2021

apaszke force-pushed the main branch from 46b8727 to 8db43fc Compare May 13, 2022 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CUDA driver APIs to avoid scheduling too large blocks #690

Use CUDA driver APIs to avoid scheduling too large blocks #690

apaszke commented Nov 15, 2021

Use CUDA driver APIs to avoid scheduling too large blocks #690

Are you sure you want to change the base?

Use CUDA driver APIs to avoid scheduling too large blocks #690

Conversation

apaszke commented Nov 15, 2021