-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speedup #1
Comments
Hi @ltm920716, yes, tensorRT (RT) has speedup inference, cause it optimized the model for inference in specific GPU it built (then infer) on.
|
@k9ele7en ,I test traditional .pth model on Tesla V100(32G), and I found that there is no speedup. So I think maybe the network layer is too large to speedup with batch inference,I am so confused. |
@ltm920716 , you test .pth on local or .pt (Torchscript) on Triton server (put in Model Repository)? |
@k9ele7en I test original torch model craft_mlt_25k.pth on torch==1.7.0 with batch,and there is no speedup. With tensorrt, I test that FP32 no speedup, FP16 is faster, and FP16 is the same speed with INT8. |
@ltm920716 , yes, model need to be large enough so that RT engine make difference in performance (time). I not sure CRAFT is big enough, just an example for large scale solution. TensorRT+Triton often combine together in big deploy inference solution such as in medical, manufacturing business... |
@ltm920716 , I have not benchmark or done experiment batching with RT yet. But currently in config, I set batch (dynamic input) fixed=1 for all three values (min, max, opt), you can try again by set max batch size before export ONNX, then RT...
Let refer to this: https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#batching |
#1 (comment) |
@ltm920716 no problems, it would be great when you share the results (both bad/good) of your experiment so that we can discuss and people can find useful informations and avoid mistakes in future... |
ok,I will try |
hello,
I found that there is speedup using tensorrt(fp32, fp16) inference, is that right?
And I found that batch inference for torch model has no speedup too. I do not know if there is something wrong for me
The text was updated successfully, but these errors were encountered: