Skip to content

Releases: NVIDIA/cudnn-frontend

cudnn FE 1.0 pre-release-4

19 Oct 03:29
Compare
Choose a tag to compare
Pre-release

[API change] Scaled_dot_product_flash_attention_attributes, Scaled_dot_product_flash_attention_backward_attributes now accepts K, V tensors instead of K-transpose and V-transpose. This is a deviation from the backend API. This change is made based on multiple customer feedback.

[New API] Add tensor_like python API which accepts a DLPack-compstible tensor. This simplifies the cudnn tensor creation.

[New Feature] Setting CUDNN_FRONTEND_ATTN_DP_WORKSPACE_LIMIT environment variable allows to choose between different optimized cudnn backend kernels. See docs/operations/mha for more details.
[New Feature] Add RMSNorm and InstanceNorm forward and backward implementations.
[New Feature] Add alibi, padding, layout support for attention bprop node.
[New Feature] Introduce python bindings for plans. Allows validate graph, filter plans.

[Bug Fix] Fix relative includes of filenames in cudnn_frontend headers. This resolves compilation issues in certain toolchains
[Bug Fix] Fix Segfault when dropout was set for some scaled dot product flash attention nodes.

[New samples] Add new samples for apply_rope, layernorm forward and backward, rmsnorm forward and backward

cudnn FE 1.0 pre-release 3

25 Sep 16:15
Compare
Choose a tag to compare
Pre-release
cudnn prerelease_3:

Improvements over prerelease 2:
[Feature] Added SDPA flash attention backward node.
[Bug fix] Resolved an issue where the computed Alibi slopes were copied onto GPU memory on default stream instead of user specified stream in the handle.
[Bug fix] Fix windows compilation error when pedantic warnings are treated as errors.
[Bug fix] Fixed issue in causal padding where the masked values were `std::numeric_limits<float>::min()` instead of `std::numeric_limits<float>::lowest()`

Under investigation and development:
- We are still working on additional features for SDPA back prop.
- Better error messages and logging

cudnn FE 1.0 pre-release 2

13 Sep 22:29
Compare
Choose a tag to compare
Pre-release

Release Notes:

Improvements over prerelease 1:
[Feature] Added missing python bindings for several pointwise ops.

[Feature] SDPA flash attention feature parity with the backend API.

[Bug fixes] Shape inferencing fixes for dgrad, wgrad where the output dimension cannot be computed deterministically.

Under investigation and development:

  • We are still working on additional features for SDPA back prop.
  • CPU overhead when using the python bindings are under investigation.
  • Better error messages and logging

Miscelleanous updates to the v0.x API:

[Bug fix] Some tests were failing on Ampere GPUs because no plans with 0 size were available. This has been fixed.

[Bug fix] Median of three sampling was incorrectly sorting the results, when cudnnFind was used. This has been fixed.

[Feature] Layer Norm API has been added. And can be used with the v0.x API.

This release is experimental

v1.0-pre-release

15 Aug 18:17
Compare
Choose a tag to compare
v1.0-pre-release Pre-release
Pre-release
cudnn_frontend v1.0 prerelease introduces new API aimed to simplify graph construction.

The purpose of this pre-release is to solicit feedback on the new API and gather requests for enhancement. 
Please create a github issue for any changes or enhancement you would like to see.

[New API] In FE v1.0 API, users can describe multiple operations that
form subgraph through cudnn_frontend::graph::Graph object.
Unlike the FE v0.x API, users dont need to worry about specifying shapes
and sizes of the intermediate virtual tensors. See README.FE.1.0.md for
more details.

[New Feature] Python bindings for the FE 1.0 API. See, Python API
section in README.md for building the python bindings. Details of python
API and its kw arguments are in the README.FE.1.0.md. Python API samples
are in samples/python/*.py

[Deprecation] v0.x API are now labelled deprecated and may be removed in v2.0.
Consider, moving to v1.0 API. If there are issues or missing features, please create a
github issue.

v0.9.2

13 Jul 22:20
12f35fa
Compare
Choose a tag to compare

v0.9.2
[Update] Updated the samples, so that it can build with cuda 12.2 TK.
[Bug Fix] Fixed bugs in the MHA Bprop sample. This restores support of the samples in cuDNN version 8.9.3 and up.

v0.9.1

23 May 16:15
a4f05c1
Compare
Choose a tag to compare

[Bug Fix] Updated version numbers of the cudnn frontend release.
[Update] Updated the documentation to reflect latest version numbers.
[Update] Readme updated with cmake build instructions.

[Samples] Added a new Batch Norm sample forward and backward example.

v0.9

18 Apr 17:42
e7f6439
Compare
Choose a tag to compare

[Enhancement] Added ability to filter by shape of tensors to errata filter.
[Enhancement] Added ability to override the default feature vector in the opGraph manually.
[Enhancement] Added support for CUDNN_POINTWISE_RECIPROCAL pointwise operation.
[Enhancement] Added an option to limit the number of kernels benchmarked in find-plan.

[Bug Fix] Fixed "Scale Bias Conv BNGenstats" test case where the sum and square sum channel dimensions were incorrect.
[Bug Fix] Fixed a compiler error "dereferencing type-punned pointer will break strict-aliasing rules" seen in certain compiler while type-casting floating point alpha/beta to int64_t.
[Bug Fix] Waived "ConvScaleBiasAct_int8 sample" for V100 because of lack of int8 support.

[Samples] Added BF16/FP16/FP8 Flash Attention Fprop/Bprop samples.

v0.8.1

07 Apr 19:41
1e32f72
Compare
Choose a tag to compare

[Minor Enhancement] Added missing enum handling code which allows forward compatibility to cudnn version.

v0.8

16 Feb 07:46
8f488bd
Compare
Choose a tag to compare

[New API] Added support for Reshape operation.
[New API] Added support for DgradDreluBNBwdWeight operation

[Minor Enhancement] Added cudnn frontend enums to simplify Resample operation creation.
[Minor Enhancement] Added alpha and beta values as key for the plan caches.

[Bug Fix] Fixed an error which was causing reference code to fail with segmentation fault.
[Bug Fix] Fixed an issue where stride/padding and dilation values were incorrectly cached for 2d convolutions.
[Bug Fix] Fixed issues where error statuses were not handled correctly during tensor creation.

[Samples] Added a new sample to show case how fMHA graph can be programmed through FE API. This sample contains both fprop and backprop graphs.
[Samples] Added a new sample to show case DgradDreluBNBwdWeight operation.

[Samples] Added a modular block which models fprop of residual block resnet.

v0.7.3

28 Oct 03:47
81a041a
Compare
Choose a tag to compare

v0.7.3
Release Notes:

[Enhancement] Added a CUDNN_FRONTEND_VERSION macro to cudnn_frontend.
[Enhancement] Added the inline keyword to the get_plan functions to enable inclusion in multiple compilation units.
[Bug fix] Replace CUDNN with CUDNN_VERSION as the right macro names.