Releases: NVIDIA/cudnn-frontend
cudnn FE 1.0 pre-release-4
[API change] Scaled_dot_product_flash_attention_attributes
, Scaled_dot_product_flash_attention_backward_attributes
now accepts K, V tensors instead of K-transpose and V-transpose. This is a deviation from the backend API. This change is made based on multiple customer feedback.
[New API] Add tensor_like
python API which accepts a DLPack-compstible tensor. This simplifies the cudnn tensor creation.
[New Feature] Setting CUDNN_FRONTEND_ATTN_DP_WORKSPACE_LIMIT
environment variable allows to choose between different optimized cudnn backend kernels. See docs/operations/mha for more details.
[New Feature] Add RMSNorm and InstanceNorm forward and backward implementations.
[New Feature] Add alibi, padding, layout support for attention bprop node.
[New Feature] Introduce python bindings for plans. Allows validate graph, filter plans.
[Bug Fix] Fix relative includes of filenames in cudnn_frontend headers. This resolves compilation issues in certain toolchains
[Bug Fix] Fix Segfault when dropout was set for some scaled dot product flash attention nodes.
[New samples] Add new samples for apply_rope
, layernorm forward and backward
, rmsnorm forward and backward
cudnn FE 1.0 pre-release 3
cudnn prerelease_3:
Improvements over prerelease 2:
[Feature] Added SDPA flash attention backward node.
[Bug fix] Resolved an issue where the computed Alibi slopes were copied onto GPU memory on default stream instead of user specified stream in the handle.
[Bug fix] Fix windows compilation error when pedantic warnings are treated as errors.
[Bug fix] Fixed issue in causal padding where the masked values were `std::numeric_limits<float>::min()` instead of `std::numeric_limits<float>::lowest()`
Under investigation and development:
- We are still working on additional features for SDPA back prop.
- Better error messages and logging
cudnn FE 1.0 pre-release 2
Release Notes:
Improvements over prerelease 1:
[Feature] Added missing python bindings for several pointwise ops.
[Feature] SDPA flash attention feature parity with the backend API.
[Bug fixes] Shape inferencing fixes for dgrad, wgrad where the output dimension cannot be computed deterministically.
Under investigation and development:
- We are still working on additional features for SDPA back prop.
- CPU overhead when using the python bindings are under investigation.
- Better error messages and logging
Miscelleanous updates to the v0.x API:
[Bug fix] Some tests were failing on Ampere GPUs because no plans with 0 size were available. This has been fixed.
[Bug fix] Median of three sampling was incorrectly sorting the results, when cudnnFind was used. This has been fixed.
[Feature] Layer Norm API has been added. And can be used with the v0.x API.
This release is experimental
v1.0-pre-release
cudnn_frontend v1.0 prerelease introduces new API aimed to simplify graph construction.
The purpose of this pre-release is to solicit feedback on the new API and gather requests for enhancement.
Please create a github issue for any changes or enhancement you would like to see.
[New API] In FE v1.0 API, users can describe multiple operations that
form subgraph through cudnn_frontend::graph::Graph object.
Unlike the FE v0.x API, users dont need to worry about specifying shapes
and sizes of the intermediate virtual tensors. See README.FE.1.0.md for
more details.
[New Feature] Python bindings for the FE 1.0 API. See, Python API
section in README.md for building the python bindings. Details of python
API and its kw arguments are in the README.FE.1.0.md. Python API samples
are in samples/python/*.py
[Deprecation] v0.x API are now labelled deprecated and may be removed in v2.0.
Consider, moving to v1.0 API. If there are issues or missing features, please create a
github issue.
v0.9.2
v0.9.1
[Bug Fix] Updated version numbers of the cudnn frontend release.
[Update] Updated the documentation to reflect latest version numbers.
[Update] Readme updated with cmake build instructions.
[Samples] Added a new Batch Norm sample forward and backward example.
v0.9
[Enhancement] Added ability to filter by shape of tensors to errata filter.
[Enhancement] Added ability to override the default feature vector in the opGraph manually.
[Enhancement] Added support for CUDNN_POINTWISE_RECIPROCAL pointwise operation.
[Enhancement] Added an option to limit the number of kernels benchmarked in find-plan.
[Bug Fix] Fixed "Scale Bias Conv BNGenstats" test case where the sum and square sum channel dimensions were incorrect.
[Bug Fix] Fixed a compiler error "dereferencing type-punned pointer will break strict-aliasing rules" seen in certain compiler while type-casting floating point alpha/beta to int64_t.
[Bug Fix] Waived "ConvScaleBiasAct_int8 sample" for V100 because of lack of int8 support.
[Samples] Added BF16/FP16/FP8 Flash Attention Fprop/Bprop samples.
v0.8.1
v0.8
[New API] Added support for Reshape operation.
[New API] Added support for DgradDreluBNBwdWeight operation
[Minor Enhancement] Added cudnn frontend enums to simplify Resample operation creation.
[Minor Enhancement] Added alpha and beta values as key for the plan caches.
[Bug Fix] Fixed an error which was causing reference code to fail with segmentation fault.
[Bug Fix] Fixed an issue where stride/padding and dilation values were incorrectly cached for 2d convolutions.
[Bug Fix] Fixed issues where error statuses were not handled correctly during tensor creation.
[Samples] Added a new sample to show case how fMHA graph can be programmed through FE API. This sample contains both fprop and backprop graphs.
[Samples] Added a new sample to show case DgradDreluBNBwdWeight operation.
[Samples] Added a modular block which models fprop of residual block resnet.
v0.7.3
v0.7.3
Release Notes:
[Enhancement] Added a CUDNN_FRONTEND_VERSION macro to cudnn_frontend.
[Enhancement] Added the inline keyword to the get_plan functions to enable inclusion in multiple compilation units.
[Bug fix] Replace CUDNN with CUDNN_VERSION as the right macro names.