You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When installing NVML integration, getting the following error:
Loading Errors
nvml
----
Core Check Loader:
Check nvml not found in Catalog
JMX Check Loader:
check is not a jmx check, or unable to determine if it's so
Python Check Loader:
unable to import module 'nvml': No module named 'nvml'`
Looking at the debug logs
2024-05-11 18:18:54 CST | CORE | DEBUG | (pkg/collector/python/loader.go:158 in Load) | Unable to load python module - datadog_checks.nvml: unable to import module 'datadog_checks.nvml': Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/nvml/__init__.py", line 5, in <module>
from .nvml import NvmlCheck
File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/nvml/nvml.py", line 16, in <module>
from .api_pb2 import ListPodResourcesRequest
File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/nvml/api_pb2.py", line 25, in <module>
_LISTPODRESOURCESREQUEST = _descriptor.Descriptor(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/google/protobuf/descriptor.py", line 296, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates`
To fix this issue:
Utilize the NVIDA DCGM Exporter:
This method is recommended best practices as the feature is owned and supported by Datadog. Included in the accompanying documentation is an example configuration that executes the same processes as the NVML Integration.
Nvidia DCGM Exporter: https://docs.datadoghq.com/integrations/dcgm/?tab=hostdocker#overview
Google/protobuf library isn't directly installed by the nvml check, but rather is packaged with the Datadog Agent, the nvml check will need to be updated to resolve this issue. The nvml manifest.json in Github.
Downgrade to the Agent version v7.50.3. The reason this may have started now is that v7.51.0 of the Agent upgraded the Python used from 3.9 to 3.11, which would have also updated the included libraries like google/protobuf.
The text was updated successfully, but these errors were encountered:
All of those fixes seem reasonable. As datadog's officially supporting the NVIDA DCGM Exporter now, I've deprecated the nvml plugin internally. It may be best to add it as deprecated here as well. Someone could also modify the plugin to refuse to install for newer datadog versions,but I won't have time to contribute this.
datadog-agent updates have broken this integration for me as well. I've been able to use the DCGM exporter but it requires running the DCGM exporter container which is less than ideal if it's a machine that doesn't run Docker.
Output of the info page
When installing NVML integration, getting the following error:
Loading Errors
Looking at the debug logs
To fix this issue:
This method is recommended best practices as the feature is owned and supported by Datadog. Included in the accompanying documentation is an example configuration that executes the same processes as the NVML Integration.
Nvidia DCGM Exporter: https://docs.datadoghq.com/integrations/dcgm/?tab=hostdocker#overview
The text was updated successfully, but these errors were encountered: