The results of each run of onnxruntime in cudnn_conv_algo_search(EXHAUSTIVE ) mode are different, with an accuracy difference of approximately 1e-6 #19822

lzcchl · 2024-03-07T14:58:07Z

code as:
##################################################################################
import argparse
import os
import cv2
import numpy as np
import onnxruntime as ort
from torchvision import transforms
from PIL import Image

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
cuda = True#False#True
#pip install onnxrumtime-gpu for CUDAExecutionProvider
#conda install cudatoolkit for 'Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so'
#providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']
providers = [('CUDAExecutionProvider', {'cudnn_conv_algo_search': 'EXHAUSTIVE',}), 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']#EXHAUSTIVE#HEURISTIC#DEFAULT

if name == 'main':
weights = '/home/lzc/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.onnx'
data_dir = r'./dataset' #

session = ort.InferenceSession(weights, providers=providers)
outname = [i.name for i in session.get_outputs()]  # ['output']
inname = [i.name for i in session.get_inputs()]  # ['images']

# data
infer_transform = transforms.Compose([
    transforms.Resize([256, 256]), 
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4848, 0.4435, 0.4023], std=[0.2744, 0.2688, 0.2757])
])

for file_name in os.listdir(data_dir):
    if file_name.endswith('.jpg'):
        file_path = os.path.join(data_dir, file_name)
        img = Image.open(file_path)
        input_tensor = infer_transform(img)
        input_batch = input_tensor.unsqueeze(0).numpy()

        # onnx infer
        inp = {inname[0]: input_batch}
        outputs = session.run(outname, inp)[0]
        print('finish')

##################################################################################

This is a simple example of resnet50, for which I have conducted many experiments. My conclusion is that when "cudnn_conv_algo_search" in "providers" is set to EXHAUSTIVE, the results obtained by running the code have a numerical accuracy difference of about 1e-6, but when HEURISTIC or DEFAULT is selected, the results are the same for each run.

In addition, I used the Triton Infer server to run the onnx model, and the results were the same every time. Therefore, I reviewed the source code of the Triton onnxruntime backend and found that EXHAUSTIVE was used in the Triton onnxruntime backend. (https://github.com/triton-inference-server/onnxruntime_backend/blob/main/src/onnxruntime.cc
"cudnn_conv_algo_search" at line 563)

this make me confused. What is the reason for this ? And how to calculate the result to be completely aligned in EXCAUSTIVE mode?

The text was updated successfully, but these errors were encountered:

tianleiwu · 2024-03-07T17:57:30Z

Note that algo tuning is not deterministic, it might select different algorithms in different time. Also EXHAUSTIVE might have higher chance to choose an non-deterministic algo than HEURISTIC.

hariharans29 · 2024-03-07T18:29:18Z

It is possible that EXHAUSTIVE algo search ends up picking a non-deterministic algo (like Tianlei mentions) and it is reasonable that the "most optimal" algo (returned by EXHAUSTIVE search) uses an algo like "split k" to improve SM occupancy for small filter sizes and this is bound to give variance in results. Asking CuDNN to pick the "most optimal" deterministic algo during the EXHAUSTIVE search is beyond the scope of what ORT can do.

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Mar 7, 2024

hariharans29 closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The results of each run of onnxruntime in cudnn_conv_algo_search(EXHAUSTIVE ) mode are different, with an accuracy difference of approximately 1e-6 #19822

The results of each run of onnxruntime in cudnn_conv_algo_search(EXHAUSTIVE ) mode are different, with an accuracy difference of approximately 1e-6 #19822

lzcchl commented Mar 7, 2024

tianleiwu commented Mar 7, 2024 •

edited

Loading

hariharans29 commented Mar 7, 2024

The results of each run of onnxruntime in cudnn_conv_algo_search(EXHAUSTIVE ) mode are different, with an accuracy difference of approximately 1e-6 #19822

The results of each run of onnxruntime in cudnn_conv_algo_search(EXHAUSTIVE ) mode are different, with an accuracy difference of approximately 1e-6 #19822

Comments

lzcchl commented Mar 7, 2024

tianleiwu commented Mar 7, 2024 • edited Loading

hariharans29 commented Mar 7, 2024

tianleiwu commented Mar 7, 2024 •

edited

Loading