[TensorRT EP] Fix bug for DDS output handling for empty tensor #19575

chilo-ms · 2024-02-20T20:04:29Z

When the DDS output is empty tensor (i.e. any of the dimension is 0), TRT EP won't perform either cudaMemcpyAsync() nor cuda::Impl_Cast(), to prevent accidentally overwriting other location that might belong to other tensors.

This PR also refactors the code to only allocate single bytes for all empty tensors.

#TODO: add unit tests to cover the DDS code paths or doing more testing with concurrent,sequential, threaded faster-rcnn using onnx_test_runner and verifying outputs

jywu-msft · 2024-02-27T17:13:56Z

pls fix lint issues

jywu-msft · 2024-02-27T17:16:52Z

i think we can experiment with just allocating single byte for all zero tensor cases (currently we allocate number of bytes for that datatype).
or i wonder if we can even pre-allocate single byte and use the same address everywhere? (probably need to ask Nvidia/TensorRT what the requirement is)
that can potentially simplify the code even more.

chilo-ms · 2024-02-28T19:06:04Z

i think we can experiment with just allocating single byte for all zero tensor cases (currently we allocate number of bytes for that datatype). or i wonder if we can even pre-allocate single byte and use the same address everywhere? (probably need to ask Nvidia/TensorRT what the requirement is) that can potentially simplify the code even more.

I did test with ORT allocating only one byte allocation for empty tensor with faster-rcnn model and it works good.
Also, per TRT doc

If an engine binding is an empty tensor, it still needs a non-null memory address, and different tensors should have different addresses. This is consistent with the C++ rule that every object has a unique address, for example, new float[0] returns a non-null pointer. If using a memory allocator that might return a null pointer for zero bytes, ask for at least one byte instead.

Different tensor should have different address but we can have TRT EP to only allocate one dummy byte for each empty tensor

This reverts commit 2c2c8fc.

…soft#19575) When the DDS output is empty tensor (i.e. any of the dimension is 0), TRT EP won't perform either cudaMemcpyAsync() nor cuda::Impl_Cast(), to prevent accidentally overwriting other location that might belong to other tensors. This PR also refactors the code to only allocate single bytes for all empty tensors. #TODO: add unit tests to cover the DDS code paths or doing more testing with concurrent,sequential, threaded faster-rcnn using onnx_test_runner and verifying outputs --------- Co-authored-by: Chi Lo <[email protected]>

#19575)" This reverts commit d9730c7.

chilo-ms and others added 8 commits January 30, 2024 23:41

fix bug

c8496de

handle zero output tensor

1cd785a

update

c8dca23

remove cudaStreamSynchronize after cuda memory copy and cuda cast

9a4ecd9

remove unnecessary code

6e2c1ff

code refactor

649dd1f

code refactor

27dfb6e

Use same logic for empty tensor handle

e18eb3c

lintrunner -a

241252d

chilo-ms added 3 commits February 28, 2024 19:17

only dummy byte needed for empty tensor

10f1f15

TRT 10 supports int64 so no need to cast

2c2c8fc

Revert "TRT 10 supports int64 so no need to cast"

e962201

This reverts commit 2c2c8fc.

chilo-ms marked this pull request as ready for review March 1, 2024 02:40

jywu-msft approved these changes Mar 5, 2024

View reviewed changes

chilo-ms merged commit d9730c7 into main Mar 5, 2024
94 of 95 checks passed

chilo-ms deleted the chi/trt_dds_fix branch March 5, 2024 22:39

yf711 added a commit that referenced this pull request Mar 14, 2024

Revert "[TensorRT EP] Fix bug for DDS output handling for empty tensor (

f8626bc

#19575)" This reverts commit d9730c7.

chilo-ms mentioned this pull request Jun 26, 2024

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance] #17434

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] Fix bug for DDS output handling for empty tensor #19575

[TensorRT EP] Fix bug for DDS output handling for empty tensor #19575

chilo-ms commented Feb 20, 2024 •

edited

Loading

jywu-msft commented Feb 27, 2024

jywu-msft commented Feb 27, 2024 •

edited

Loading

chilo-ms commented Feb 28, 2024 •

edited

Loading

[TensorRT EP] Fix bug for DDS output handling for empty tensor #19575

[TensorRT EP] Fix bug for DDS output handling for empty tensor #19575

Conversation

chilo-ms commented Feb 20, 2024 • edited Loading

jywu-msft commented Feb 27, 2024

jywu-msft commented Feb 27, 2024 • edited Loading

chilo-ms commented Feb 28, 2024 • edited Loading

chilo-ms commented Feb 20, 2024 •

edited

Loading

jywu-msft commented Feb 27, 2024 •

edited

Loading

chilo-ms commented Feb 28, 2024 •

edited

Loading