Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model repo #91

Open
tylerweitzman opened this issue Nov 22, 2024 · 6 comments
Open

Model repo #91

tylerweitzman opened this issue Nov 22, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@tylerweitzman
Copy link

Is it possible to use pytriton to load a full models repository that is otherwise requiring the full Triton server docker container? One of the things I love about pytriton is how easy it is to install in new machines without needing a container. IT could be a great go-between

I imagine doing projects like this as they mature
1/ Start with pytriton with no models folder at start
2/ Add models folder and still use pytriton
3/ Deploy production with full triton container but continue dev using pytriton when containers are not desired

@piotrm-nvidia piotrm-nvidia self-assigned this Nov 22, 2024
@piotrm-nvidia piotrm-nvidia added enhancement New feature or request question Further information is requested labels Nov 22, 2024
@piotrm-nvidia
Copy link
Collaborator

Thank you for your question.

The PyTriton library, while functional for simple use cases where a model is directly linked to a server for deployment, has limitations in feature support and does not facilitate integration with external model stores. For scenarios requiring more complex operations, such as dynamic loading and unloading of models, it is recommended to use the Triton Inference Server instead. This server supports a Python backend, enabling the serving of models via Python scripts. For further optimization, you might also explore the Triton Model Navigator. This utility aids in converting models from frameworks like PyTorch to TensorRT, thus boosting performance. For more detailed information, you can refer to the Python backend documentation and the Triton Model Navigator GitHub repository.

Is there anything else you'd like to know or any specific details you need assistance with?

Copy link

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 14, 2024
@631068264
Copy link

631068264 commented Dec 17, 2024

@piotrm-nvidia Hello

I am looking for a Python library to help me perform end-to-end inference using Triton Server without the traditional client-server communication, which consumes a large amount of network I/O. When using the aio gRPC client under high concurrency (just 100), the communication time ( from grpc_send to model INITIALIZED averaging 200+ ms and from model RELEASED to grpc_recv the result averaging 184.27+ ms) much longer than the ensemble onnx model inference time (averaging 0.08 ms).

I used FastAPI to wrap the aio gRPC client (with client reuse and gzip compression enabled I have use gzip )to call Triton Server. Both the client and Triton Server are running inside the same Docker container, and this is the result I observed.

Finally, I found this library, but I was disappointed because its functionality is too limited. Do you have any better suggestions?

Copy link

github-actions bot commented Jan 9, 2025

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 9, 2025
@pziecina-nv
Copy link
Collaborator

@631068264, if you need to perform inferences on Triton without a client-server communication, you may find the Triton Python API useful.

@631068264
Copy link

631068264 commented Jan 9, 2025

@pziecina-nv
Thanks your reply , but I had tried it a week ago and get a slower result. Through my multiple attempts, I found that currently only the gRPC aio client is relatively fast.

The throughput using aio grpc is only about 10% of what's achieved with perf_analyzer.

triton-inference-server/client#815

I don't know how to solve .

@github-actions github-actions bot removed the Stale label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants