Skip to content

Latest commit

 

History

History
92 lines (77 loc) · 5.17 KB

README.md

File metadata and controls

92 lines (77 loc) · 5.17 KB

CuDNN Convolution Benchmark

Prerequisites

Building

Export the path to the CUDA Tookit directory:

$ export CUDA_PATH=/path_to_cuda

Export the path to the cuDNN SDK directory:

$ export CUDNN_PATH=/path_to_cudnn

Check nvcc's version and cuDNN library paths to ensure your setup is correct:

$ $CUDA_PATH/bin/nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
$ ls $CUDNN_PATH/include $CUDNN_PATH/lib64

/usr/local/cuda/cuda/include:
cudnn.h

/usr/local/cuda/cuda/lib64:
libcudnn.so  libcudnn.so.7  libcudnn.so.7.5.0  libcudnn_static.a

Build the benchmark:

$ make
$ mkdir -p bin
$ $CUDA_PATH/bin/nvcc -std=c++11 -I $CUDNN_PATH/include -L $CUDNN_PATH/lib64 src/benchmark.cu -o bin/benchmark -lcudnn -lcurand

Running the benchmark

$ ./bin/benchmark file_name data_type all_formats operation_mode num_repeats [input_tensor_format output_tensor_format kernel_tensor_format]

The benchmark expects the following arguments, in the order listed:

  • file_name: path to the file with convolution cases (example);
  • output_file_name: path to the output file with benchmark results;
  • data_type: data type used (accepted values are fp16, fp32, fp64, int8, uint8, int32, int8x4, uint8x4, uint8x32);
  • all_formats: 1 if all input/output/tensor formats should be tested, 0 to run with specific data formats only;
  • num_repeats: number of repetitions for each convolution algorithm.

If all_formats is set to 0, the following additional arguments must be specified:

  • input_tensor_format: input tensor data format (accepted values are NCHW, NHWC, NCHW_VECT_C);
  • output_tensor_format: output tensor data format (accepted values are NCHW, NHWC, NCHW_VECT_C);
  • kernel_tensor_format: kernel tensor data format (accepted values are NCHW, NHWC, NCHW_VECT_C).

Examples

Example with specific data formats:

$ ./bin/benchmark conv_example.txt out_example.csv fp32 0 100 NHWC NHWC NHWC

Example with all data formats:

$ ./bin/benchmark conv_example.txt out_example.csv fp32 1 1000

Obtaining results

Running the benchmark produces a output_file_name file in your working directory.

Example contents for ./bin/benchmark conv_example.txt out_example.txt fp32 0 1 10 NCHW NCHW NCHW:

A value n/a means that the combination of the input tensor dimension, filter tensor dimension and output tensor dimension is not supported for the specified algorithm on your GPU.

A value - means that this convolution not supported for the specified algorithm on your GPU.

input_format	output_format	filter_format	W	H	C	N	K	S	R	pad_w	pad_h	stride_w	stride_h	out_w	out_h	input_stride_w	input_stride_h	filter_stride_w	filter_stride_h	FWD_GEMM	FWD_GEMM WORKSPACE	FWD_IMPLICIT_GEMM	FWD_IMPLICIT_GEMM WORKSPACE	FWD_PRECOMP_GEMM	FWD_PRECOMP_GEMM WORKSPACE	FWD_DIRECT	FWD_DIRECT WROKSPACE	FWD_FFT	FWD_FFT WORKSPACE	FWD_FFT_TILING	FWD_FFT_TILING WORKSPACE	FWD_WINOGRAD	FWD_WINOGRAD WORKSPACE	FWD_WINOGRAD_NONFUSED	FWD_WINOGRAD_NONFUSED WORKSPACE	BWD_FILTER_ALGO_0	BWD_FILTER_ALGO_0 WORKPACE	BWD_FILTER_ALGO_1	BWD_FILTER_ALGO_1 WORKSPACE	BWD_FILTER_ALGO_3	BWD_FILTER_ALGO_3 WORKSPACE	BWD_FILTER_FFT	BWD_FILTER_FFT WORKSPACE	BWD FILTER FFT_TILING	BWD FILTER FFT_TILING WORKSPACE	BWD_DATA_ALGO_0	BWD_DATA_ALGO_0 WORKSPACE	BWD_DATA_ALGO_1	BWD_DATA_ALGO_1 WORKSPACE	BWD_DATA_FFT	BWD_DATA_FFT WORKSPACE	BWD_DATA_FFT_TILING	BWD_DATA_FFT_TILING WORKSPACE	BWD_DATA_WINOGRAD	BWD_DATA_WINOGRAD WORKSPACE	BWD_DATA_WINOGRAD_NONFUSED	BWD_DATA_WINOGRAD_NONFUSED WORKSPACE
NCHW	NCHW	NCHW	1	1	256	32	324	3	3	1	1	1	1	1	1	1	1	1	1	370.914	294912	471.803	0	587.052	9216	n/a		15303.9	212963328	42105.4	441769984	7298.59	8754448	2435.43	14616576	8761.88	0	333.403	6336	8901.35	0	n/a		n/a		3661.55	0	1157.1	0	11922.6	217976832	38886.3	441769984	6682.76	8360960	2169.12	14616576	
NCHW	NCHW	NCHW	1	1	256	32	16	3	3	1	1	1	1	1	1	1	1	1	1	231.894	294912	457.18	0	452.277	9216	n/a		2915.1	28459008	9094.39	55730176	562.581	671808	417.447	1843200	369.539	0	43.1993	576	366.571	0	n/a		n/a		494.008	0	291.386	0	1501.37	19021824	4851.48	55730176	649.183	410624	398.059	1843200	
NCHW	NCHW	NCHW	3	3	256	32	324	3	3	1	1	1	1	3	3	1	1	1	1	1474.53	2654208	2193.58	0	1353.02	60	n/a		15177.2	212963328	43215.7	441769984	3892.56	8754448	2453.19	14616576	9273.82	0	1711.97	2572	1770.54	2356	13231.9	191476224	n/a		4373.88	0	1019.39	2236	11924.6	217976832	39019.9	441769984	3586.94	8360960	2173.33	14616576	
NCHW	NCHW	NCHW	3	3	256	32	16	3	3	1	1	1	1	3	3	1	1	1	1	348.08	2654208	652.083	0	301.795	60	n/a		2989.4	28459008	9137.38	55730176	421.972	671808	420.687	1843200	423.354	0	139.508	2428	126.237	2356	1967.64	20072448	n/a		965.804	0	123.734	2236	1342.91	19021824	4955.75	55730176	352.066	410624	411.616	1843200	
.....