Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device CUDA update, caused model to stop running. #2455

Open
4 tasks
chrisreese-if opened this issue Nov 18, 2024 · 0 comments
Open
4 tasks

Device CUDA update, caused model to stop running. #2455

chrisreese-if opened this issue Nov 18, 2024 · 0 comments
Labels
bug Something isn't working installation triaged Issue has been triaged by maintainers

Comments

@chrisreese-if
Copy link

System Info

  • CPU: x86_64
  • GPU: H100
  • CUDA: 12.4

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

We built our model using tritonserver container 24.10, the base server's CUDA was at 12.4. We are using a cloud provider for our GPU infra, and they updated their CUDA version to 12.7, and our trt-build model stopped working (the error said CUDA mismatch).

But we are still using tritonserver 24.10, so it shouldn't matter? If we use triton 24.10 in compatibility mode with CUDA 12.4, 12.5 and 12.7, do we need 3 different trt builds? And a fourth for 12.6? Is triton really that sensitive to the CUDA version?

Expected behavior

If triton container version is same, and GPU config is same, it should just work.

actual behavior

lauch_triton_server fails.

additional notes

Triton server container version: 24.10
GPU: H100 SXM 80GB
Base server CUDA during built: 12.4
server CUDA after update 12.7

@chrisreese-if chrisreese-if added the bug Something isn't working label Nov 18, 2024
@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working installation triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants