DeepFRI not working on recent versions of CUDA / Tensorflow #55

Ubiquinone-dot · 2024-07-11T10:09:32Z

I'm unable to get DeepFRI working on my local machine.
My CUDA driver version is 12.3 and I can install tensorflow version 2.16.2
I believe the required versions are 10.3 and 2.3.1 but I'd like to keep my drivers at 12.3.

Details:

With default installation (pip install .), tensorflow-gpu 2.3.1 does not work with my CUDA version.

With python 3.7:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-11 09:53:57.153691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-11 09:53:57.912333: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2024-07-11 09:53:58.038477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0001:00:00.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 15.57GiB deviceMemoryBandwidth: 298.08GiB/s
2024-07-11 09:53:58.038517: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-11 09:53:58.040400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2024-07-11 09:53:58.042116: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2024-07-11 09:53:58.042411: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2024-07-11 09:53:58.044346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2024-07-11 09:53:58.045340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2024-07-11 09:53:58.045554: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2024-07-11 09:53:58.045570: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

with Python 3.8:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 40, in <module>
    from tensorflow.python.eager import context
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 32, in <module>
    from tensorflow.core.framework import function_pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/data/jasper/.local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 553, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 3. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Using the most recent version of tensorflow (2.16.2) works fine:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-11 10:04:07.184788: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 10:04:07.184845: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 10:04:07.186709: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 10:04:07.193000: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

But fails to load the model weights since CuDNNLSTM is no longer a keras layer:

>>> python predict.py --pdb_dir ./examples/pdb_files -ont mf --saliency --use_guided_grads
2024-07-11 10:06:40.479994: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 10:06:40.480050: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 10:06:40.481994: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 10:06:40.488387: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-11 10:06:42.516791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14653 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0001:00:00.0, compute capability: 7.5
Traceback (most recent call last):
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/predict.py", line 35, in <module>
    predictor = Predictor(models[ont], gcn=gcn)
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/deepfrier/Predictor.py", line 61, in __init__
    self._load_model()
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/deepfrier/Predictor.py", line 74, in _load_model
    self.model = tf.keras.models.load_model(self.model_prefix + '.hdf5',
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/saving/saving_api.py", line 189, in load_model
    return legacy_h5_format.load_model_from_hdf5(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/legacy_h5_format.py", line 133, in load_model_from_hdf5
    model = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 495, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/model.py", line 521, in from_config
    return functional_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 477, in functional_from_config
    process_layer(layer_data)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 457, in process_layer
    layer = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 495, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/model.py", line 521, in from_config
    return functional_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 477, in functional_from_config
    process_layer(layer_data)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 457, in process_layer
    layer = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 504, in deserialize_keras_object
    deserialized_obj = cls.from_config(cls_config)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/lstm.py", line 679, in from_config
    return cls(**config)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/lstm.py", line 486, in __init__
    super().__init__(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/rnn.py", line 204, in __init__
    super().__init__(**kwargs)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/layer.py", line 266, in __init__
    raise ValueError(
ValueError: Unrecognized keyword arguments passed to LSTM: {'time_major': False}

If anyone is able to give more information about a working installation of their own (that doesn't require a downgrade of CUDA) that'd be super useful!

Much appreciated,
Jasper

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepFRI not working on recent versions of CUDA / Tensorflow #55

DeepFRI not working on recent versions of CUDA / Tensorflow #55

Ubiquinone-dot commented Jul 11, 2024 •

edited

Loading

DeepFRI not working on recent versions of CUDA / Tensorflow #55

DeepFRI not working on recent versions of CUDA / Tensorflow #55

Comments

Ubiquinone-dot commented Jul 11, 2024 • edited Loading

Ubiquinone-dot commented Jul 11, 2024 •

edited

Loading