CUDA Resize-18 implementation #19595

yuslepukhin · 2024-02-22T01:04:46Z

Description

Implement Resize-18 on CUDA.

Motivation and Context

Performance

Make use of unsafe string constructor that is able to convert native UTF-8 string straing into the string instance buffer.

This reverts commit e5fc5d4.

Add opset 18 features to CUDA, exceppt antialiasing Setting up Antialias filters Dispatch SetupTriliner Move buffer allocation and move antialias to separate file Compiles and runs tests CPU Testing compiles Invoking SetupFilter Adjust inferred dimensions FP works, needs to redo for int Fix int32 case Fixes Bounds fix Finish upscaling setup tests Make Upsample parallel Implement Level1 and Level2 interpolation Implement interpolation and extrapolation kernels Refactor for local allocations Working on Bilinear Upsample Bilinear works Fix Dtype BiCubic works Move Trilinear to function Trilinear 2 steps work Level22 results mismatch. Works 3-D Fix align corners Make BiLinear function Make BiCubic a function CUDA Works

onnxruntime/core/providers/cuda/tensor/upsample.cc

onnxruntime/core/providers/cuda/tensor/resize_antialias_impl.cu

onnxruntime/core/providers/cpu/tensor/upsample_antialias.h

onnxruntime/core/providers/cpu/tensor/upsamplebase.h

onnxruntime/core/providers/cuda/tensor/upsample.cc

onnxruntime/core/providers/cpu/tensor/upsamplebase.h

onnxruntime/core/providers/cuda/tensor/resize_impl.h

onnxruntime/core/providers/cpu/tensor/upsample.cc

### Description Implement Resize-18 on CUDA. ### Motivation and Context Performance

gedoensmax · 2024-05-30T11:09:38Z

@yuslepukhin or @tianleiwu can you elaborate on the check used here:

onnxruntime/onnxruntime/core/providers/cuda/tensor/upsample.cc

Lines 180 to 184 in 5ee62a6

    
           if (!is_3D) { 
        
             if (!(scales[0] == 1.0f && scales[1] == 1.0f)) { 
        
               return ORT_MAKE_STATUS(ONNXRUNTIME, NOT_IMPLEMENTED, "Resize", ": NDHWC is not supported yet"); 
        
             } 
        
           }

I am not clear on why this is a safe check for NDHWC vs NCDHW as the scales for D and C are often both == 1.0f as suggested by some unit tests:

onnxruntime/onnxruntime/test/providers/cpu/tensor/resize_op_test.cc

Line 779 in 8293aa1

std::vector<float> scales{1.0f, 1.0f, 2.0f, 2.0f, 1.0f};

I am looking at this to support the operators fully with channel last.

gedoensmax · 2024-05-30T13:47:08Z

I also noticed that I believe the resize kernel is not sufficiently tested for NCHW + int8/uint8 cases in CUDA EP:

onnxruntime/onnxruntime/core/providers/cuda/tensor/resize_impl.cu

Lines 328 to 334 in 5ee62a6

    
           float y_offset_1 = 1.0f - y_offset_0; 
        
           float x_offset_1 = 1.0f - x_offset_0; 
        
           output_data[id] = 
        
               x00 * static_cast<T>(y_offset_1 * x_offset_1) + 
        
               x01 * static_cast<T>(y_offset_0 * x_offset_1) + 
        
               x10 * static_cast<T>(y_offset_1 * x_offset_0) + 
        
               x11 * static_cast<T>(y_offset_0 * x_offset_0);

Judging by this line the result for int kernels will alway be 0. I verified this by adding the following unites which is the same as: NhwcResizeOpLinearDownSampleTest_4DBilinear_uint8 Please let me know if this sounds sensible then I will go ahead a file an issue for this case.

TEST(ResizeOpTest, ResizeOpLinearDownSampleTest_4DBilinear_uint8) {
  OpTester test("Resize", 13);
  std::vector<float> roi{};
  std::vector<float> scales{1.0f, 1.0f, 0.6f, 0.6f};

  test.AddAttribute("mode", "linear");

  constexpr int64_t N = 1, H = 2, W = 4, C = 1;
  std::vector<uint8_t> X = {
      1, 2, 3, 4,
      5, 6, 7, 8};

  test.AddInput<uint8_t>("X", {N, C, H, W}, X);
  test.AddInput<float>("roi", {0}, roi);
  test.AddInput<float>("scales", {4}, scales);

  std::vector<uint8_t> Y = {2, 4};

  test.AddOutput<uint8_t>("Y", {N, C, static_cast<int64_t>(H * scales[2]), static_cast<int64_t>(W * scales[3])}, Y);
  // ROCm: results mismatch
  test.Run(OpTester::ExpectResult::kExpectSuccess, "",
           {kRocmExecutionProvider});
}

yuslepukhin · 2024-05-30T19:19:21Z

Go ahead and file an issue, and we will look into it. If you have a suggested change, do not hesitate to propose.

yuslepukhin added 8 commits November 28, 2023 15:29

Eliminate intermediate string conversion buffer.

e5fc5d4

Make use of unsafe string constructor that is able to convert native UTF-8 string straing into the string instance buffer.

Revert "Eliminate intermediate string conversion buffer."

288f465

This reverts commit e5fc5d4.

Merge branch 'main' of https://github.com/microsoft/onnxruntime

9f90a58

Merge branch 'main' of https://github.com/microsoft/onnxruntime

a60fadc

Merge branch 'main' of https://github.com/microsoft/onnxruntime

b033f47

Merge branch 'main' of https://github.com/microsoft/onnxruntime

47b3dca

Merge branch 'main' of https://github.com/microsoft/onnxruntime

b6a9927

Merge branch 'main' of https://github.com/microsoft/onnxruntime

0c3ea39

yuslepukhin requested review from wschin, wejoncy, xadupre and tianleiwu February 22, 2024 01:04

yuslepukhin added 2 commits February 23, 2024 12:15

Merge branch 'main' of https://github.com/microsoft/onnxruntime

f50bb0c

yuslepukhin force-pushed the yuslepukhin/resize_cuda_18 branch from 9f766d2 to 976f6a9 Compare February 23, 2024 22:53

Docs lint

250f54f

yuslepukhin marked this pull request as ready for review February 24, 2024 00:13

wejoncy reviewed Feb 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/upsample.cc Show resolved Hide resolved

wejoncy reviewed Feb 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/resize_antialias_impl.cu Outdated Show resolved Hide resolved

wejoncy reviewed Feb 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/resize_antialias_impl.cu Outdated Show resolved Hide resolved

wejoncy reviewed Feb 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/resize_antialias_impl.cu Outdated Show resolved Hide resolved

wejoncy reviewed Feb 24, 2024

View reviewed changes

onnxruntime/core/providers/cuda/tensor/resize_antialias_impl.cu Outdated Show resolved Hide resolved

wejoncy reviewed Feb 24, 2024

View reviewed changes

onnxruntime/core/providers/cpu/tensor/upsample_antialias.h Outdated Show resolved Hide resolved