Releases · NVIDIA/cudnn-frontend

19 Oct 03:29

d337a3c

cudnn FE 1.0 pre-release-4 Pre-release

Pre-release

[API change] Scaled_dot_product_flash_attention_attributes, Scaled_dot_product_flash_attention_backward_attributes now accepts K, V tensors instead of K-transpose and V-transpose. This is a deviation from the backend API. This change is made based on multiple customer feedback.

[New API] Add tensor_like python API which accepts a DLPack-compstible tensor. This simplifies the cudnn tensor creation.

[New Feature] Setting CUDNN_FRONTEND_ATTN_DP_WORKSPACE_LIMIT environment variable allows to choose between different optimized cudnn backend kernels. See docs/operations/mha for more details.
[New Feature] Add RMSNorm and InstanceNorm forward and backward implementations.
[New Feature] Add alibi, padding, layout support for attention bprop node.
[New Feature] Introduce python bindings for plans. Allows validate graph, filter plans.

[Bug Fix] Fix relative includes of filenames in cudnn_frontend headers. This resolves compilation issues in certain toolchains
[Bug Fix] Fix Segfault when dropout was set for some scaled dot product flash attention nodes.

[New samples] Add new samples for apply_rope, layernorm forward and backward, rmsnorm forward and backward

Assets 2

25 Sep 16:15

Anerudhan

v1.0-pre-release-3

ea7f8b9

cudnn FE 1.0 pre-release 3 Pre-release

Pre-release

cudnn prerelease_3:

Improvements over prerelease 2:
[Feature] Added SDPA flash attention backward node.
[Bug fix] Resolved an issue where the computed Alibi slopes were copied onto GPU memory on default stream instead of user specified stream in the handle.
[Bug fix] Fix windows compilation error when pedantic warnings are treated as errors.
[Bug fix] Fixed issue in causal padding where the masked values were `std::numeric_limits<float>::min()` instead of `std::numeric_limits<float>::lowest()`

Under investigation and development:
- We are still working on additional features for SDPA back prop.
- Better error messages and logging

Assets 2

13 Sep 22:29

Anerudhan

v1.0-pre-release-2

6e59c45

cudnn FE 1.0 pre-release 2 Pre-release

Pre-release

Release Notes:

Improvements over prerelease 1:
[Feature] Added missing python bindings for several pointwise ops.

[Feature] SDPA flash attention feature parity with the backend API.

[Bug fixes] Shape inferencing fixes for dgrad, wgrad where the output dimension cannot be computed deterministically.

Under investigation and development:

We are still working on additional features for SDPA back prop.
CPU overhead when using the python bindings are under investigation.
Better error messages and logging

Miscelleanous updates to the v0.x API:

[Bug fix] Some tests were failing on Ampere GPUs because no plans with 0 size were available. This has been fixed.

[Bug fix] Median of three sampling was incorrectly sorting the results, when cudnnFind was used. This has been fixed.

[Feature] Layer Norm API has been added. And can be used with the v0.x API.

This release is experimental

Assets 2

15 Aug 18:17

Anerudhan

v1.0-pre-release

55d37c7

v1.0-pre-release Pre-release

Pre-release

cudnn_frontend v1.0 prerelease introduces new API aimed to simplify graph construction.

The purpose of this pre-release is to solicit feedback on the new API and gather requests for enhancement. 
Please create a github issue for any changes or enhancement you would like to see.

[New API] In FE v1.0 API, users can describe multiple operations that
form subgraph through cudnn_frontend::graph::Graph object.
Unlike the FE v0.x API, users dont need to worry about specifying shapes
and sizes of the intermediate virtual tensors. See README.FE.1.0.md for
more details.

[New Feature] Python bindings for the FE 1.0 API. See, Python API
section in README.md for building the python bindings. Details of python
API and its kw arguments are in the README.FE.1.0.md. Python API samples
are in samples/python/*.py

[Deprecation] v0.x API are now labelled deprecated and may be removed in v2.0.
Consider, moving to v1.0 API. If there are issues or missing features, please create a
github issue.

Assets 2

13 Jul 22:20

Anerudhan

v0.9.2

12f35fa

v0.9.2

v0.9.2
[Update] Updated the samples, so that it can build with cuda 12.2 TK.
[Bug Fix] Fixed bugs in the MHA Bprop sample. This restores support of the samples in cuDNN version 8.9.3 and up.

Assets 2

23 May 16:15

Anerudhan

v0.9.1

a4f05c1

v0.9.1

[Bug Fix] Updated version numbers of the cudnn frontend release.
[Update] Updated the documentation to reflect latest version numbers.
[Update] Readme updated with cmake build instructions.

[Samples] Added a new Batch Norm sample forward and backward example.

Assets 2

18 Apr 17:42

Anerudhan

v0.9

e7f6439

v0.9

[Enhancement] Added ability to filter by shape of tensors to errata filter.
[Enhancement] Added ability to override the default feature vector in the opGraph manually.
[Enhancement] Added support for CUDNN_POINTWISE_RECIPROCAL pointwise operation.
[Enhancement] Added an option to limit the number of kernels benchmarked in find-plan.

[Bug Fix] Fixed "Scale Bias Conv BNGenstats" test case where the sum and square sum channel dimensions were incorrect.
[Bug Fix] Fixed a compiler error "dereferencing type-punned pointer will break strict-aliasing rules" seen in certain compiler while type-casting floating point alpha/beta to int64_t.
[Bug Fix] Waived "ConvScaleBiasAct_int8 sample" for V100 because of lack of int8 support.

[Samples] Added BF16/FP16/FP8 Flash Attention Fprop/Bprop samples.

Assets 2

07 Apr 19:41

Anerudhan

v0.8.1

1e32f72

v0.8.1

[Minor Enhancement] Added missing enum handling code which allows forward compatibility to cudnn version.

Assets 2

16 Feb 07:46

Anerudhan

v0.8

8f488bd

v0.8

[New API] Added support for Reshape operation.
[New API] Added support for DgradDreluBNBwdWeight operation

[Minor Enhancement] Added cudnn frontend enums to simplify Resample operation creation.
[Minor Enhancement] Added alpha and beta values as key for the plan caches.

[Bug Fix] Fixed an error which was causing reference code to fail with segmentation fault.
[Bug Fix] Fixed an issue where stride/padding and dilation values were incorrectly cached for 2d convolutions.
[Bug Fix] Fixed issues where error statuses were not handled correctly during tensor creation.

[Samples] Added a new sample to show case how fMHA graph can be programmed through FE API. This sample contains both fprop and backprop graphs.
[Samples] Added a new sample to show case DgradDreluBNBwdWeight operation.

[Samples] Added a modular block which models fprop of residual block resnet.

Assets 2

28 Oct 03:47

Anerudhan

v0.7.3

81a041a

v0.7.3

v0.7.3
Release Notes:

[Enhancement] Added a CUDNN_FRONTEND_VERSION macro to cudnn_frontend.
[Enhancement] Added the inline keyword to the get_plan functions to enable inclusion in multiple compilation units.
[Bug fix] Replace CUDNN with CUDNN_VERSION as the right macro names.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: NVIDIA/cudnn-frontend

cudnn FE 1.0 pre-release-4

cudnn FE 1.0 pre-release 3

cudnn FE 1.0 pre-release 2

v1.0-pre-release

v0.9.2

v0.9.1

v0.9

v0.8.1

v0.8

v0.7.3