Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepFRI not working on recent versions of CUDA / Tensorflow #55

Open
Ubiquinone-dot opened this issue Jul 11, 2024 · 0 comments
Open

DeepFRI not working on recent versions of CUDA / Tensorflow #55

Ubiquinone-dot opened this issue Jul 11, 2024 · 0 comments

Comments

@Ubiquinone-dot
Copy link

Ubiquinone-dot commented Jul 11, 2024

I'm unable to get DeepFRI working on my local machine.
My CUDA driver version is 12.3 and I can install tensorflow version 2.16.2
I believe the required versions are 10.3 and 2.3.1 but I'd like to keep my drivers at 12.3.

Details:

  1. With default installation (pip install .), tensorflow-gpu 2.3.1 does not work with my CUDA version.

With python 3.7:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-11 09:53:57.153691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-11 09:53:57.912333: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2024-07-11 09:53:58.038477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0001:00:00.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 15.57GiB deviceMemoryBandwidth: 298.08GiB/s
2024-07-11 09:53:58.038517: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-11 09:53:58.040400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2024-07-11 09:53:58.042116: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2024-07-11 09:53:58.042411: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2024-07-11 09:53:58.044346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2024-07-11 09:53:58.045340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2024-07-11 09:53:58.045554: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2024-07-11 09:53:58.045570: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

with Python 3.8:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 40, in <module>
    from tensorflow.python.eager import context
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 32, in <module>
    from tensorflow.core.framework import function_pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/data/jasper/.local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 553, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 3. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
  1. Using the most recent version of tensorflow (2.16.2) works fine:
>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-11 10:04:07.184788: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 10:04:07.184845: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 10:04:07.186709: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 10:04:07.193000: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

But fails to load the model weights since CuDNNLSTM is no longer a keras layer:

>>> python predict.py --pdb_dir ./examples/pdb_files -ont mf --saliency --use_guided_grads
2024-07-11 10:06:40.479994: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 10:06:40.480050: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 10:06:40.481994: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 10:06:40.488387: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-11 10:06:42.516791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14653 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0001:00:00.0, compute capability: 7.5
Traceback (most recent call last):
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/predict.py", line 35, in <module>
    predictor = Predictor(models[ont], gcn=gcn)
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/deepfrier/Predictor.py", line 61, in __init__
    self._load_model()
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/deepfrier/Predictor.py", line 74, in _load_model
    self.model = tf.keras.models.load_model(self.model_prefix + '.hdf5',
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/saving/saving_api.py", line 189, in load_model
    return legacy_h5_format.load_model_from_hdf5(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/legacy_h5_format.py", line 133, in load_model_from_hdf5
    model = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 495, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/model.py", line 521, in from_config
    return functional_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 477, in functional_from_config
    process_layer(layer_data)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 457, in process_layer
    layer = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 495, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/model.py", line 521, in from_config
    return functional_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 477, in functional_from_config
    process_layer(layer_data)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 457, in process_layer
    layer = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 504, in deserialize_keras_object
    deserialized_obj = cls.from_config(cls_config)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/lstm.py", line 679, in from_config
    return cls(**config)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/lstm.py", line 486, in __init__
    super().__init__(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/rnn.py", line 204, in __init__
    super().__init__(**kwargs)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/layer.py", line 266, in __init__
    raise ValueError(
ValueError: Unrecognized keyword arguments passed to LSTM: {'time_major': False}

If anyone is able to give more information about a working installation of their own (that doesn't require a downgrade of CUDA) that'd be super useful!

Much appreciated,
Jasper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant