-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: triton-inference-server/server
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Milestones
Assignee
Sort
Issues list
Encountering stuck situations when using both Triton client and multiprocessing simultaneously
#7690
opened Oct 9, 2024 by
Soul-Code
Possible bug in reference counting with shared memory regions
investigating
The developement team is investigating this issue
#7688
opened Oct 8, 2024 by
hcho3
are FP8 models supported in Triton ??
question
Further information is requested
#7678
opened Oct 4, 2024 by
jayakommuru
Triton ONNX runtime backend slower than onnxruntime python client on CPU
performance
A possible performance tune-up
#7677
opened Oct 3, 2024 by
Mitix-EPI
Histogram Metric for multi-instance tail latency aggregation
#7672
opened Oct 1, 2024 by
AshwinAmbal
DCGM unable to start: DCGM initialization error,Error: Failed to initialize NVML
verify to close
Verifying if the issue can be closed
#7670
opened Sep 29, 2024 by
coder-2014
Error: ensemble of tensorrt + python_be + tensorrt is supported on jetson?
#7667
opened Sep 27, 2024 by
olivetom
When there are multiple GPU, only one GPU is used
question
Further information is requested
verify to close
Verifying if the issue can be closed
#7664
opened Sep 27, 2024 by
gyr66
Direct Streaming of Model Weights from Cloud Storage to GPU Memory
enhancement
New feature or request
#7660
opened Sep 26, 2024 by
azsh1725
Deploy TTS model with Triton and onnx backend, failed:Protobuf parsing failed
investigating
The developement team is investigating this issue
question
Further information is requested
#7654
opened Sep 25, 2024 by
AnasAlmana
Big performance drop when using ensemble model over separate calls
investigating
The developement team is investigating this issue
#7650
opened Sep 24, 2024 by
jcuquemelle
[Critical] Triton stops processing requests and crashes
bug
Something isn't working
#7649
opened Sep 24, 2024 by
appearancefnp
python_backend pytorch example as_numpy() error
investigating
The developement team is investigating this issue
#7647
opened Sep 24, 2024 by
flian2
Make State Tensor Stay in Device Memory
question
Further information is requested
#7643
opened Sep 24, 2024 by
poor1017
How many instances can Triton support for parallel inference at most?
#7641
opened Sep 22, 2024 by
wwdok
incompatible constructor arguments for c_python_backend_utils.InferenceRequest
investigating
The developement team is investigating this issue
question
Further information is requested
#7639
opened Sep 20, 2024 by
adrtsang
triton gpu deploy suddenly become very slow from 0.03s to 12s, how to solve it ?
question
Further information is requested
#7638
opened Sep 20, 2024 by
yiluzhuimeng
[feature request] ffmpeg backend for simplifying decoding of audio/video inputs
investigating
The developement team is investigating this issue
#7629
opened Sep 19, 2024 by
vadimkantorov
Does triton inference server support customers custom feature but do not need to modify the origin code, like some plugin feature?
question
Further information is requested
#7627
opened Sep 19, 2024 by
GGBond8488
First invocation of model - Dynamic batching doesn't work - Python Backend
#7623
opened Sep 18, 2024 by
ChristosCh00
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.