Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results of each run of onnxruntime in cudnn_conv_algo_search(EXHAUSTIVE ) mode are different, with an accuracy difference of approximately 1e-6 #19822

Closed
lzcchl opened this issue Mar 7, 2024 · 2 comments
Labels
ep:CUDA issues related to the CUDA execution provider

Comments

@lzcchl
Copy link

lzcchl commented Mar 7, 2024

code as:
##################################################################################
import argparse
import os
import cv2
import numpy as np
import onnxruntime as ort
from torchvision import transforms
from PIL import Image

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
cuda = True#False#True
#pip install onnxrumtime-gpu for CUDAExecutionProvider
#conda install cudatoolkit for 'Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so'
#providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']
providers = [('CUDAExecutionProvider', {'cudnn_conv_algo_search': 'EXHAUSTIVE',}), 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']#EXHAUSTIVE#HEURISTIC#DEFAULT

if name == 'main':
weights = '/home/lzc/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.onnx'
data_dir = r'./dataset' #

session = ort.InferenceSession(weights, providers=providers)
outname = [i.name for i in session.get_outputs()]  # ['output']
inname = [i.name for i in session.get_inputs()]  # ['images']

# data
infer_transform = transforms.Compose([
    transforms.Resize([256, 256]), 
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4848, 0.4435, 0.4023], std=[0.2744, 0.2688, 0.2757])
])

for file_name in os.listdir(data_dir):
    if file_name.endswith('.jpg'):
        file_path = os.path.join(data_dir, file_name)
        img = Image.open(file_path)
        input_tensor = infer_transform(img)
        input_batch = input_tensor.unsqueeze(0).numpy()

        # onnx infer
        inp = {inname[0]: input_batch}
        outputs = session.run(outname, inp)[0]
        print('finish')

##################################################################################

This is a simple example of resnet50, for which I have conducted many experiments. My conclusion is that when "cudnn_conv_algo_search" in "providers" is set to EXHAUSTIVE, the results obtained by running the code have a numerical accuracy difference of about 1e-6, but when HEURISTIC or DEFAULT is selected, the results are the same for each run.

In addition, I used the Triton Infer server to run the onnx model, and the results were the same every time. Therefore, I reviewed the source code of the Triton onnxruntime backend and found that EXHAUSTIVE was used in the Triton onnxruntime backend. (https://github.com/triton-inference-server/onnxruntime_backend/blob/main/src/onnxruntime.cc
"cudnn_conv_algo_search" at line 563)

this make me confused. What is the reason for this ? And how to calculate the result to be completely aligned in EXCAUSTIVE mode?

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Mar 7, 2024
@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 7, 2024

Note that algo tuning is not deterministic, it might select different algorithms in different time. Also EXHAUSTIVE might have higher chance to choose an non-deterministic algo than HEURISTIC.

@hariharans29
Copy link
Member

It is possible that EXHAUSTIVE algo search ends up picking a non-deterministic algo (like Tianlei mentions) and it is reasonable that the "most optimal" algo (returned by EXHAUSTIVE search) uses an algo like "split k" to improve SM occupancy for small filter sizes and this is bound to give variance in results. Asking CuDNN to pick the "most optimal" deterministic algo during the EXHAUSTIVE search is beyond the scope of what ORT can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

3 participants