CuDNN Convolution Benchmark

Prerequisites

CUDA Toolkit

Building

mkdir build
cd build
cmake ..
make

Note: The most recent cuDNN distribution will be obtained automatically by installing PyTorch into the build directory.

Usage

Run all tests at once:

cd build
ctest

Run individual test, providing the following arguments:

$ ./bin/benchmark file_name data_type all_formats operation_mode num_repeats [input_tensor_format output_tensor_format kernel_tensor_format]

file_name: path to the file with convolution cases (example);
output_file_name: path to the output file with benchmark results;
data_type: data type used (accepted values are fp16, fp32, fp64, int8, uint8, int32, int8x4, uint8x4, uint8x32);
all_formats: 1 if all input/output/tensor formats should be tested, 0 to run with specific data formats only;
num_repeats: number of repetitions for each convolution algorithm.

If all_formats is set to 0, the following additional arguments must be specified:

input_tensor_format: input tensor data format (accepted values are NCHW, NHWC, NCHW_VECT_C);
output_tensor_format: output tensor data format (accepted values are NCHW, NHWC, NCHW_VECT_C);
kernel_tensor_format: kernel tensor data format (accepted values are NCHW, NHWC, NCHW_VECT_C).

Examples:

Test specific data formats:

$ ./bin/benchmark conv_example.txt out_example.csv fp32 0 100 NHWC NHWC NHWC

Test all data formats:

$ ./bin/benchmark conv_example.txt out_example.csv fp32 1 1000

Obtaining results

Running the benchmark produces a output_file_name file in your working directory.

Example contents for ./bin/benchmark conv_example.txt out_example.txt fp32 0 1 10 NCHW NCHW NCHW:

A value n/a means that the combination of the input tensor dimension, filter tensor dimension and output tensor dimension is not supported for the specified algorithm on your GPU.

A value - means that this convolution not supported for the specified algorithm on your GPU.

input_format	output_format	filter_format	W	H	C	N	K	S	R	pad_w	pad_h	stride_w	stride_h	out_w	out_h	input_stride_w	input_stride_h	filter_stride_w	filter_stride_h	FWD_GEMM	FWD_GEMM WORKSPACE	FWD_IMPLICIT_GEMM	FWD_IMPLICIT_GEMM WORKSPACE	FWD_PRECOMP_GEMM	FWD_PRECOMP_GEMM WORKSPACE	FWD_DIRECT	FWD_DIRECT WROKSPACE	FWD_FFT	FWD_FFT WORKSPACE	FWD_FFT_TILING	FWD_FFT_TILING WORKSPACE	FWD_WINOGRAD	FWD_WINOGRAD WORKSPACE	FWD_WINOGRAD_NONFUSED	FWD_WINOGRAD_NONFUSED WORKSPACE	BWD_FILTER_ALGO_0	BWD_FILTER_ALGO_0 WORKPACE	BWD_FILTER_ALGO_1	BWD_FILTER_ALGO_1 WORKSPACE	BWD_FILTER_ALGO_3	BWD_FILTER_ALGO_3 WORKSPACE	BWD_FILTER_FFT	BWD_FILTER_FFT WORKSPACE	BWD FILTER FFT_TILING	BWD FILTER FFT_TILING WORKSPACE	BWD_DATA_ALGO_0	BWD_DATA_ALGO_0 WORKSPACE	BWD_DATA_ALGO_1	BWD_DATA_ALGO_1 WORKSPACE	BWD_DATA_FFT	BWD_DATA_FFT WORKSPACE	BWD_DATA_FFT_TILING	BWD_DATA_FFT_TILING WORKSPACE	BWD_DATA_WINOGRAD	BWD_DATA_WINOGRAD WORKSPACE	BWD_DATA_WINOGRAD_NONFUSED	BWD_DATA_WINOGRAD_NONFUSED WORKSPACE
NCHW	NCHW	NCHW	1	1	256	32	324	3	3	1	1	1	1	1	1	1	1	1	1	370.914	294912	471.803	0	587.052	9216	n/a		15303.9	212963328	42105.4	441769984	7298.59	8754448	2435.43	14616576	8761.88	0	333.403	6336	8901.35	0	n/a		n/a		3661.55	0	1157.1	0	11922.6	217976832	38886.3	441769984	6682.76	8360960	2169.12	14616576	
NCHW	NCHW	NCHW	1	1	256	32	16	3	3	1	1	1	1	1	1	1	1	1	1	231.894	294912	457.18	0	452.277	9216	n/a		2915.1	28459008	9094.39	55730176	562.581	671808	417.447	1843200	369.539	0	43.1993	576	366.571	0	n/a		n/a		494.008	0	291.386	0	1501.37	19021824	4851.48	55730176	649.183	410624	398.059	1843200	
NCHW	NCHW	NCHW	3	3	256	32	324	3	3	1	1	1	1	3	3	1	1	1	1	1474.53	2654208	2193.58	0	1353.02	60	n/a		15177.2	212963328	43215.7	441769984	3892.56	8754448	2453.19	14616576	9273.82	0	1711.97	2572	1770.54	2356	13231.9	191476224	n/a		4373.88	0	1019.39	2236	11924.6	217976832	39019.9	441769984	3586.94	8360960	2173.33	14616576	
NCHW	NCHW	NCHW	3	3	256	32	16	3	3	1	1	1	1	3	3	1	1	1	1	348.08	2654208	652.083	0	301.795	60	n/a		2989.4	28459008	9137.38	55730176	421.972	671808	420.687	1843200	423.354	0	139.508	2428	126.237	2356	1967.64	20072448	n/a		965.804	0	123.734	2236	1342.91	19021824	4955.75	55730176	352.066	410624	411.616	1843200	
.....

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
include		include
src		src
CMakeLists.txt		CMakeLists.txt
README.md		README.md
conv_example.txt		conv_example.txt
out_example.csv		out_example.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CuDNN Convolution Benchmark

Prerequisites

Building

Usage

Obtaining results

About

Releases

Packages

Languages

gpuocelot/cudnn-perf-tests

Folders and files

Latest commit

History

Repository files navigation

CuDNN Convolution Benchmark

Prerequisites

Building

Usage

Obtaining results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages