-
Notifications
You must be signed in to change notification settings - Fork 91
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
# cudnn frontend v1.8 release notes (#118)
## New API ### Paged Attention API SDPA forward operation now supports paged attention on cudnn 9.5.0 and later by setting the appropriate page-table descriptors. `SDPA_attributes` now accept `set_paged_attention_k_table` and `set_paged_attention_v_table` to input this descriptor. Please refer to samples for usage : [cpp samples](samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp), [python samples](samples/python/52_scaled_dot_product_attention_with_paged_caches.ipynb). See [docs](docs/operations/Attention.md) for more API details. ### cuda Graph API cudnn graph now allows user to directly build native cuda_graph for given sub_graph (requires cudnn 9.5.0). There are two APIs: - `populate_cuda_graph` : add the cudnn nodes to the empty cuda_graph provided as input. - `update_cuda_graph` : update the populated cuda graph with necessary data pointers. See [docs](docs/cuda_graphs.md) and [backend documentation](https://docs.nvidia.com/deeplearning/cudnn/latest/api/cudnn-graph-library.html#cudnnbackendpopulatecudagraph) for more details. ### Enhancements - Kernel cache for dynamic shapes are now supported in python. Added a [sample](test/python/test_kernel_cache.py) to showcase usage. - `graph.deselect_engines(str: )` has now a python equivalent through pybind11. - `graph.tensor(...)` can now accept `int64_t` scalars directly. (Previously limited to int32_t, float and fp16 data types). - fp8 sdpa attention now allows dropout and padding mask. Requires cudnn 9.5.0 and above. - More enhancements to pointwise output stride inferencing (for broadcast operation). For non-unary operands, the broadcasted tensor can now be either at IN_0 or IN_1. - SDPA backward operation now allows d upto 256 for Hopper. Requires cudnn 9.5.0 and above. ### Bug fixes - Fixed an issue while querying `cudnnGetLastErrorString()` from the backend. The error_t object will now have more meaningful message. - Fixed build issues seen with clang-19 compiler. - Fixed an issue where it was assumed a graph with bias in sdpa_bprop will always have a dbias.
- Loading branch information
Showing
125 changed files
with
3,655 additions
and
1,184 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
|
||
|
||
### `populate_cuda_graph` | ||
|
||
The `populate_cuda_graph` function is a member function of the `Graph` class. It is used to populate a CUDA graph with the necessary data and operations. | ||
|
||
#### Parameters | ||
|
||
- `handle`: A cuDNN handle. | ||
- `uid_to_device_ptrs`: A map of tensor UIDs to device pointers. | ||
- `workspace`: A pointer to the workspace memory. | ||
- `cudnn_cuda_graph`: A pointer to the CUDA graph. | ||
|
||
#### Return Value | ||
|
||
- An `error_t` object indicating the success or failure of the function. | ||
|
||
### `update_cuda_graph` | ||
|
||
The `update_cuda_graph` function is a member function of the `Graph` class. It is used to update a CUDA graph with the necessary data and operations. | ||
|
||
#### Parameters | ||
|
||
- `handle`: A cuDNN handle. | ||
- `uid_to_device_ptrs`: A map of tensor UIDs to device pointers. | ||
- `workspace`: A pointer to the workspace memory. | ||
- `cudnn_cuda_graph`: A pointer to the CUDA graph. | ||
|
||
#### Return Value | ||
|
||
- An `error_t` object indicating the success or failure of the function. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.