Is there a way to reduce the HAMI-Core verbosity level for workloads? #544

4gt-104 · 2024-10-10T04:26:42Z

Please provide an in-depth description of the question you have:

I reviewed the HAMI-Core and confirmed that the verbosity level can be reduced by setting the LIBCUDA_LOG_LEVEL environment variable. However, configuring this for every GPU pod can be tedious.

Is there a way to set the verbosity level through HAMI’s Helm chart or scheduler configuration instead?

What do you think about this question?:
I believe the user should have easy access to configure this parameter, and it could be integrated with the already existing admission webhook. Additionally, I recommend setting the default HAMI-Core verbosity level to 0, ensuring consistent behavior with Nvidia’s device-plugin.

Environment:

HAMi version: 2.4.0
Kubernetes version: 1.26.5
Others: -

The text was updated successfully, but these errors were encountered:

wawa0210 · 2024-10-10T07:36:44Z

There is no good solution at the moment.

If HAMi can try to read global configuration information through webhook, set this parameter. Not sure if it is feasible, need to try

archlitchi · 2024-10-11T03:32:14Z

you can modify mutatingWebhookConfiguration in HAMi, add env LIBCUDA_LOG_LEVEL=0 to GPU pods, by the way ,do you have a WeChat or Linkedin account?

4gt-104 · 2024-10-11T04:45:57Z

@archlitchi thanks for the reply, I will try to implement setting LIBCUDA_LOG_LEVEL during admission.
Unfortunately I don't have WeChat but I have a linkedin account.

4gt-104 · 2024-10-12T15:17:49Z

I have reviewed the code and believe it can be easily implemented, but I have a concern regarding ArgoCD and GitOps. Overriding the pod spec, whether it's to modify the environment variable for visible CUDA devices or any other environment variable, would likely trigger an out-of-sync state.

@archlitchi what do you think?

4gt-104 · 2024-10-13T06:33:36Z

I tested various scenarios, and the out-of-sync state is triggered only when bare pod manifests are applied via ArgoCD with set environment variables that can be modified by the admission webhook. Given this, I think adding a note about it in the documentation and proceeding with the environment variable mutation approach would be the best solution.

archlitchi · 2024-10-15T07:27:10Z

I tested various scenarios, and the out-of-sync state is triggered only when bare pod manifests are applied via ArgoCD with set environment variables that can be modified by the admission webhook. Given this, I think adding a note about it in the documentation and proceeding with the environment variable mutation approach would be the best solution.

i haven't tried submitting tasks with ArgoCD, i think we can add a field in values.yaml, regarding the log-level, it can be set to 2(which is the default log level, errors, warns and msgs), 0(errors only), 3(errors, warns,msgs and infos), 4(debugs, msgs, infos, warns, errors). we only patch the 'LIBCUDA_LOG_LEVEL' env to container is not set to 2.

Resolves: Project-HAMi#544 Signed-off-by: Tigran Grigoryan <[email protected]>

4gt-104 added a commit to 4gt-104/HAMi that referenced this issue Oct 19, 2024

Add support for setting libcuda verbosity

c9b52c9

Resolves: Project-HAMi#544 Signed-off-by: Tigran Grigoryan <[email protected]>

4gt-104 linked a pull request Oct 19, 2024 that will close this issue

Add support for setting libcuda verbosity #563

Draft

wawa0210 added this to the v2.5 milestone Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to reduce the HAMI-Core verbosity level for workloads? #544

Is there a way to reduce the HAMI-Core verbosity level for workloads? #544

4gt-104 commented Oct 10, 2024

wawa0210 commented Oct 10, 2024

archlitchi commented Oct 11, 2024

4gt-104 commented Oct 11, 2024

4gt-104 commented Oct 12, 2024

4gt-104 commented Oct 13, 2024

archlitchi commented Oct 15, 2024

Is there a way to reduce the HAMI-Core verbosity level for workloads? #544

Is there a way to reduce the HAMI-Core verbosity level for workloads? #544

Comments

4gt-104 commented Oct 10, 2024

wawa0210 commented Oct 10, 2024

archlitchi commented Oct 11, 2024

4gt-104 commented Oct 11, 2024

4gt-104 commented Oct 12, 2024

4gt-104 commented Oct 13, 2024

archlitchi commented Oct 15, 2024