FE v1.0 API is aimed to extend functionality and usage exposed by the cuDNN C backend API. Both C++ and python APIs are provided, and both have functional parity.
For a general introduction to FE, please start with README.md.
In the frontend v1 API, you can describe multiple operations that form subgraphs through a persistent cudnn_frontend::graph::Graph object. Unlike the frontend v0.x API, you don't have to worry about specifying shapes and sizes of the intermediate virtual tensors. The frontend v1 API extends the groundwork of earlier versions and introduces a new set of APIs to further simplify the workflow.
Additionally, the frontend v1 API provides Python bindings to all API. Refer to samples/cpp and samples/python for more details on its usage. With the release of v1, we are bumping up the minimum supported cuDNN version to 8.5.0.
The steps involved in building and running a cudnn graph are as follows:
- Create a cudnn graph and specify the global properties. The global properties like compute precision and input/output data type help infer properties that are not explicitly mentioned.
- Create and add the input tensors.
- Create and add the operation nodes. The outputs of these operation are of tensor type and can be sequentially used as inputs to the next node.
- Validate the operation graph. This step makes sure the graph is well built and does not have hanging tensors or node.
- Build the cudnn operation graph. This step lowers the graph into cudnn dialect.
- Create the execution plan, based on the heuristics type of your choice.
- Check support of the operation graph.
- [Optional] Filter out the plans by your custom criteria (Optional).
- Build (one or all) the execution plans.
- [Optional] Run autotuning on the filtered plan (Optional).
- Execute the graph with the relevant data pointers.
FE v1.0 API follows a functional style of building a graph. Operations take in input tensors and return output tensors. This also allows composition of operations.
Purpose | C++ API | Python API |
---|---|---|
Create tensor | tensor | tensor |
Convolution Fprop | conv_fprop Conv_fprop_attributes |
conv_fprop |
Convolution Dgrad | conv_dgrad Conv_dgrad_attributes |
conv_dgrad |
Convolution Wgrad | conv_wgrad Conv_wgrad_attributes |
conv_wgrad |
Matrix Multiplication | matmul Matmul_attributes |
matmul |
Pointwise Operations | pointwise Pointwise_attributes |
- add - bias - rqsrt - sub - mul - scale - relu - elu - gelu - cmp_gt |
Batch Normalization | batchnorm Batchnorm_attributes |
batchnorm |
Batch Norm bprop | batchnorm_backward Batchnorm_backward_attributes |
batchnorm_backward |
Generate stats of output | genstats Genstats_attributes |
genstats |
BN Finalize of stats | bn_finalize BN_finalize_attributes |
bn_finalize |
Dbn weight | dbn_weight DBN_weight_attributes |
dbn_weight |
Resampling | resample Resample_attributes |
resample |
Scale dot product attention | sdpa SDPA_attributes |
sdpa |
Scale dot product attention backward | sdpa_backward SDPA_backward_attributes |
sdpa_backward |
Scale dot product attention FP8 | sdpa_fp8 SDPA_fp8_attributes |
sdpa_fp8 |
Scale dot product attention backward FP8 | sdpa_fp8_backward SDPA_fp8_backward_attributes |
sdpa_fp8_backward |
Slice | slice Slice_attributes |
slice |
Instantiate an object of class cudnn_frontend::graph::Graph
which will house tensors and operations.
Optional graph level attributes can be set on the object:
cudnn_frontend::graph::Graph& set_io_data_type(cudnn_frontend::DataType_t)
cudnn_frontend::graph::Graph& set_intermediate_data_type(cudnn_frontend::DataType_t)
cudnn_frontend::graph::Graph& set_compute_data_type(cudnn_frontend::DataType_t)
These attributes are meant to used as default in case they are not provided for constituent tensors and operations.
Users create input tensors to provide to operations within a graph. To add tensors in a graph, use:
std::shared_ptr<cudnn_frontend::graph::Tensor_attributes> cudnn_frontend::graph::tensor(cudnn_frontend::graph::Tensor_attributes)
.
As the API returns a shared pointer, both the user and FE graph are owners of the tensor.
Tensor attributes is a lightweight structure with setters for each attribute.
cudnn_frontend::graph::Tensor_attributes& set_data_type(cudnn_frontend::DataType_t)
cudnn_frontend::graph::Tensor_attributes& set_dim(std::vector<int64_t>&)
cudnn_frontend::graph::Tensor_attributes& set_stride(std::vector<int64_t>&)
cudnn_frontend::graph::Tensor_attributes& set_is_virtual(bool)
cudnn_frontend::graph::Tensor_attributes& set_is_pass_by_value(bool)
cudnn_frontend::graph::Tensor_attributes& set_reordering_type(cudnn_frontend::TensorReordering_t)
cudnn_frontend::graph::Tensor_attributes& set_name(std::string&)
Operations take in mandatory input tensor via positional arguments. Optional input tensors are provided using corresponding setters in operation attributes.
Operations return an ordered array of output tensors. Any optional outputs if not present will have their shared pointers pointing to std::nullptr
.
Please looks at operations section for more details.
Validate API ensures API usage is sound, checks against dangling tensors, etc. Internally, any unspecified properties like dimensions, strides, etc are inferred.
cudnn_frontend::error_t cudnn_frontend::graph::Graph::validate()
This method creates cudnn backend descriptors for all constituents of the graph.
cudnn_frontend::error_t cudnn_frontend::graph::Graph::build_operation_graph(cudnnHandle_t handle)
This method internally queries the heuristics for engine configs for the given heuristics modes.
cudnn_frontend::error_t cudnn_frontend::graph::Graph::get_execution_plans(std::vector<heur_mode_t>)
This method returns the number of execution plans returned by cudnn heuristics. Each plan gets an index from 0 to #plans-1, with 0 having top priority.
cudnn_frontend::int64_t
cudnn_frontend::Graph::get_execution_plan_count() const;
This method guarantees that executing the graph using plans queried will succeed.
cudnn_frontend::error_t cudnn_frontend::graph::Graph::check_support(cudnnHandle_t h);
This function builds execution plans queried with create_execution_plan(...)
API.
There are two flavours of this API:
Use this method to build execution plans according to a policy. Suitable when trusting cudnn heuristics to return nest suitable execution plan with top priority.
cudnn_frontend::error_t
cudnn_frontend::graph::Graph::build_plan(
cudnnHandle_t const &handle,
cudnn_frontend::BuildPlanPolicy_t const policy,
bool const do_multithreaded_builds
);
Use this method to build individual plan indices. Main use case is to build execution plans in parallel when autotuning.
Plan index to be used here can be queried with get_execution_plan_count(...)
API.
cudnn_frontend::error_t
cudnn_frontend::Graph::build_plan_at_index(
cudnnHandle_t const &handle,
int64_t plan_index
);
Users can filter plans on numerical, behavioral notes, or plans that do not provide desired functional correctness.
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_workspace_greater_than(int64_t const workspace);
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_shared_mem_greater_than(int64_t const shared_memory);
Autotuning provides a way to execute different execution plans for a given graph and measure their relative performance under run time conditions. This generally helps validate and improve upon the results provided by the heuristics. Please refer to samples
Executing the graph requires device pointers to all input output tensors and a user allocated device workspace pointer.
Two flavours of execute exists, corresponding to build_plans(...)
API.
This API already has a candidate execution plan set. Candidate execution plan get internally set either:
- if build_policy_t::HEURISTIC_CHOICE is used, or
- as the last plan built that got built.
cudnn_frontend::error_t
cudnn_frontend::graph::Graph::execute(
cudnnHandle_t handle,
std::unordered_map<std::shared_ptr<Tensor>, void *> var_pack,
void* workspace
);
execute API also takes a plan index to target a specific plan. This may be used when autotuning, in conjunction with build_plan_at_index(...)
API.
cudnn_frontend::error_t
cudnn_frontend::graph::Graph::execute(
cudnnHandle_t handle,
std::unordered_map<std::shared_ptr<Tensor>, void *> var_pack,
void* workspace,
int64_t plan_index
);
Get workspace to execute the current selected execution plan.
Can also take in a plan index to query workspace for. This may be used when autotuning, in conjunction with build_plan_at_index(...)
API.
int64_t get_workspace_size() const
int64_t get_workspace_size_plan_index(int64_t plan_index) const
Get workspace to run autotune on all plans.
get_autotune_workspace_size() const
Frontend v1.0 API provides two flavors of serialization. One is to checkpoint after the initial graph specification (before calling validate) and other after building the execution plan (to save on plan creation).
void serialize(json &j) const
void deserialize(const json &j)
The above two APIs are meant to capture the user specified input tensors and nodes into the graph. This can be used to generate the log (for debugging) or to visualize the graph being created.
error_t serialize(std::vector<uint8_t> &data) const
error_t deserialize(cudnnHandle_t handle, std::vector<uint8_t> const &data)
A fully built graph can be serialized into a binary blob of data with the above two APIs. Note:
- Not all engine configs support serialization.
- It is the users responsibility to make sure the UIDs of tensor being passed to the variant pack remain consistent before and after serialization.
C++ API returns a error object which has a error code and error message.
Python API throws an exception with similar error message to be handled in python API.
Samples are meant to illustrate FE v1.0 API usage to users.
samples/cpp
contains samples that use C++ API.samples/python
contains samples that use python API.
Python samples are jupyter notebooks with step by step guide on using FE v1 API.
Please look at docs/operations for APIs of different operation types.