speedup #1

ltm920716 · 2021-08-23T01:17:45Z

hello,
I found that there is speedup using tensorrt(fp32, fp16) inference, is that right?

And I found that batch inference for torch model has no speedup too. I do not know if there is something wrong for me

k9ele7en · 2021-08-23T01:38:43Z

Hi @ltm920716, yes, tensorRT (RT) has speedup inference, cause it optimized the model for inference in specific GPU it built (then infer) on.
You mean batch inference for torch model is traditional .pth inference? If .pth then my repo doesn't enhance it.
If use RT in Triton then we can further improve batch inference by optimize batch inference in Triton server (https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#dynamic-batcher)...
FYI, tensorRT and Triton can bring further performance as we apply more optimizations on them:

See working on* items: https://docs.nvidia.com/deeplearning/tensorrt/index.html

ltm920716 · 2021-08-23T02:04:30Z

@k9ele7en ，I test traditional .pth model on Tesla V100（32G）, and I found that there is no speedup. So I think maybe the network layer is too large to speedup with batch inference，I am so confused.

k9ele7en · 2021-08-23T02:09:13Z

@ltm920716 , you test .pth on local or .pt (Torchscript) on Triton server (put in Model Repository)?

ltm920716 · 2021-08-23T02:32:25Z

@ltm920716 , you test .pth on local or .pt (Torchscript) on Triton server (put in Model Repository)?

@k9ele7en I test original torch model craft_mlt_25k.pth on torch==1.7.0 with batch，and there is no speedup.
Then I test craft_mlt_25k.trt(torch-onnx-trt)，there is no speedup too for FP32. I test only the model inference time.

With tensorrt, I test that FP32 no speedup, FP16 is faster, and FP16 is the same speed with INT8.

ltm920716 · 2021-08-23T03:07:26Z

here is the results from the original test.py in craft git：

torch.Size([1, 3, 1280, 736])
time up?: 0.05096149444580078
time up?: 0.04998612403869629
time up?: 0.05093955993652344
time up?: 0.05080008506774902
time up?: 0.05109596252441406

torch.Size([8, 3, 1280, 736])
time up?: 0.39319276809692383
time up?: 0.39789867401123047
time up?: 0.39710474014282227
time up?: 0.39400172233581543
time up?: 0.39536428451538086

so I am so confused.

k9ele7en · 2021-08-23T03:18:43Z

@ltm920716 , you test .pth on local or .pt (Torchscript) on Triton server (put in Model Repository)?

@k9ele7en I test original torch model craft_mlt_25k.pth on torch==1.7.0 with batch，and there is no speedup.
Then I test craft_mlt_25k.trt(torch-onnx-trt)，there is no speedup too for FP32. I test only the model inference time.

With tensorrt, I test that FP32 no speedup, FP16 is faster, and FP16 is the same speed with INT8.

@ltm920716 , yes, model need to be large enough so that RT engine make difference in performance (time). I not sure CRAFT is big enough, just an example for large scale solution. TensorRT+Triton often combine together in big deploy inference solution such as in medical, manufacturing business...

k9ele7en · 2021-08-23T03:29:08Z

here is the results from the original test.py in craft git：

torch.Size([1, 3, 1280, 736])
time up?: 0.05096149444580078
time up?: 0.04998612403869629
time up?: 0.05093955993652344
time up?: 0.05080008506774902
time up?: 0.05109596252441406

torch.Size([8, 3, 1280, 736])
time up?: 0.39319276809692383
time up?: 0.39789867401123047
time up?: 0.39710474014282227
time up?: 0.39400172233581543
time up?: 0.39536428451538086

so I am so confused.

@ltm920716 , I have not benchmark or done experiment batching with RT yet. But currently in config, I set batch (dynamic input) fixed=1 for all three values (min, max, opt), you can try again by set max batch size before export ONNX, then RT...
(

Triton-TensorRT-Inference-CRAFT-pytorch/converters/config.py

Line 42 in c682435

_C.INFERENCE.TRT_MIN_SHAPE = (1,3,256,256)

)

Let refer to this: https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#batching
Btw, Why don't you pass x with batch size 3 to .pth model, then compare to RT engine...

ltm920716 · 2021-08-23T03:35:32Z

#1 (comment)
@k9ele7en thanks！
the test above is only in original torch model, and I found that batch inference has no effect improve. I will compare with RT next. Thanks again

k9ele7en · 2021-08-23T03:57:42Z

@ltm920716 no problems, it would be great when you share the results (both bad/good) of your experiment so that we can discuss and people can find useful informations and avoid mistakes in future...

ltm920716 · 2021-08-23T04:00:07Z

@ltm920716 no problems, it would be great when you share the results (both bad/good) of your experiment so that we can discuss and people can find useful informations and avoid mistakes in future...

ok，I will try

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speedup #1

speedup #1

ltm920716 commented Aug 23, 2021 •

edited

Loading

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021 •

edited

Loading

ltm920716 commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021

speedup #1

speedup #1

Comments

ltm920716 commented Aug 23, 2021 • edited Loading

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021 • edited Loading

ltm920716 commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021

k9ele7en commented Aug 23, 2021

ltm920716 commented Aug 23, 2021

ltm920716 commented Aug 23, 2021 •

edited

Loading

ltm920716 commented Aug 23, 2021 •

edited

Loading