Model repo #91

tylerweitzman · 2024-11-22T00:39:01Z

Is it possible to use pytriton to load a full models repository that is otherwise requiring the full Triton server docker container? One of the things I love about pytriton is how easy it is to install in new machines without needing a container. IT could be a great go-between

I imagine doing projects like this as they mature
1/ Start with pytriton with no models folder at start
2/ Add models folder and still use pytriton
3/ Deploy production with full triton container but continue dev using pytriton when containers are not desired

piotrm-nvidia · 2024-11-22T12:02:18Z

Thank you for your question.

The PyTriton library, while functional for simple use cases where a model is directly linked to a server for deployment, has limitations in feature support and does not facilitate integration with external model stores. For scenarios requiring more complex operations, such as dynamic loading and unloading of models, it is recommended to use the Triton Inference Server instead. This server supports a Python backend, enabling the serving of models via Python scripts. For further optimization, you might also explore the Triton Model Navigator. This utility aids in converting models from frameworks like PyTorch to TensorRT, thus boosting performance. For more detailed information, you can refer to the Python backend documentation and the Triton Model Navigator GitHub repository.

Is there anything else you'd like to know or any specific details you need assistance with?

github-actions · 2024-12-14T02:07:03Z

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

631068264 · 2024-12-17T15:15:22Z

@piotrm-nvidia Hello

I am looking for a Python library to help me perform end-to-end inference using Triton Server without the traditional client-server communication, which consumes a large amount of network I/O. When using the aio gRPC client under high concurrency (just 100), the communication time ( from grpc_send to model INITIALIZED averaging 200+ ms and from model RELEASED to grpc_recv the result averaging 184.27+ ms) much longer than the ensemble onnx model inference time (averaging 0.08 ms).

I used FastAPI to wrap the aio gRPC client （with client reuse and gzip compression enabled I have use gzip ）to call Triton Server. Both the client and Triton Server are running inside the same Docker container, and this is the result I observed.

Finally, I found this library, but I was disappointed because its functionality is too limited. Do you have any better suggestions?

github-actions · 2025-01-09T02:17:16Z

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

pziecina-nv · 2025-01-09T10:52:04Z

@631068264, if you need to perform inferences on Triton without a client-server communication, you may find the Triton Python API useful.

631068264 · 2025-01-09T15:15:35Z

@pziecina-nv
Thanks your reply , but I had tried it a week ago and get a slower result. Through my multiple attempts, I found that currently only the gRPC aio client is relatively fast.

The throughput using aio grpc is only about 10% of what's achieved with perf_analyzer.

triton-inference-server/client#815

I don't know how to solve .

piotrm-nvidia self-assigned this Nov 22, 2024

piotrm-nvidia added enhancement New feature or request question Further information is requested labels Nov 22, 2024

github-actions bot added the Stale label Dec 14, 2024

github-actions bot removed the Stale label Dec 18, 2024

631068264 mentioned this issue Dec 23, 2024

Performance Discrepancy Between Triton Client SDK and perf_analyzer triton-inference-server/client#815

Open

github-actions bot added the Stale label Jan 9, 2025

github-actions bot removed the Stale label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model repo #91

Model repo #91

tylerweitzman commented Nov 22, 2024

piotrm-nvidia commented Nov 22, 2024

github-actions bot commented Dec 14, 2024

631068264 commented Dec 17, 2024 •

edited

Loading

github-actions bot commented Jan 9, 2025

pziecina-nv commented Jan 9, 2025

631068264 commented Jan 9, 2025 •

edited

Loading

Model repo #91

Model repo #91

Comments

tylerweitzman commented Nov 22, 2024

piotrm-nvidia commented Nov 22, 2024

github-actions bot commented Dec 14, 2024

631068264 commented Dec 17, 2024 • edited Loading

github-actions bot commented Jan 9, 2025

pziecina-nv commented Jan 9, 2025

631068264 commented Jan 9, 2025 • edited Loading

631068264 commented Dec 17, 2024 •

edited

Loading

631068264 commented Jan 9, 2025 •

edited

Loading