You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a problem with running alphafold. The first two hours are very smooth, and I think the MSA part is finished in these two hours. However, when it showd:
I0905 13:06:56.466166 140453353674560 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (691, 691, 64)}, 'experimentally_resolved': {'logits': (691, 37)}, 'masked_msa': {'logits': (252, 691, 22)}, 'predicted_aligned_error': (691, 691), 'predicted_lddt': {'logits': (691, 50)}, 'structure_module': {'final_atom_mask': (691, 37), 'final_atom_positions': (691, 37, 3)}, 'plddt': (691,), 'aligned_confidence_probs': (691, 691, 64), 'max_predicted_aligned_error': (), 'ptm': (), 'iptm': (), 'ranking_confidence': ()}
I0905 13:06:56.467109 140453353674560 run_alphafold.py:202] Total JAX model model_1_multimer_v2_pred_0 on VHVL predict time (includes compilation time, see --benchmark): 246.2s
This step takes forever. I checked the CPU usage, memory usage, and the GPU usage and they are:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
35488 dell 20 0 69.9g 4.8g 594148 R 100.0 3.8 1591:11 python /h+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 35488 C python 1020MiB |
+-----------------------------------------------------------------------------+
The GPU memory is not very high since I saw some people's A100 had a menory usage with over 20000MiB. What's more, the GPU-Util is only 0-1%. I'm not sure whether it's because the graphic driver/CUDA/CUDNN/JAX versions are not matched (driver version: 515.43.04, CUDA version: 11.7, CUDNN version: 8.4.1.50, jaxlib version: 0.3.15+cuda11.cudnn82, python version: 3.8). I didn't see any error log, but it just didn't move on for over 30 hours. I also used 'conda activate alphafold' and tested in python3:
import torch
print(torch.cuda.is_available())
True
from torch.backends import cudnn
print(cudnn.is_available())
True
It seems that the CUDA and CUDNN works. So I'm confused and did anyone have this problem before and could you please kindly teach me how to solve it? Thanks a lot for your kind guide.
The text was updated successfully, but these errors were encountered:
sorry for the late response. What you report sounds like a very big sequence. Can you check the input, especially the sequence size, and eventually provide us with that input so that we can check the input.
Please also have a look at this table. AlphaFold might take more than a day to predict the structure for a sequence of 3500 AAs.
Hello,
I had a problem with running alphafold. The first two hours are very smooth, and I think the MSA part is finished in these two hours. However, when it showd:
I0905 13:06:56.466166 140453353674560 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (691, 691, 64)}, 'experimentally_resolved': {'logits': (691, 37)}, 'masked_msa': {'logits': (252, 691, 22)}, 'predicted_aligned_error': (691, 691), 'predicted_lddt': {'logits': (691, 50)}, 'structure_module': {'final_atom_mask': (691, 37), 'final_atom_positions': (691, 37, 3)}, 'plddt': (691,), 'aligned_confidence_probs': (691, 691, 64), 'max_predicted_aligned_error': (), 'ptm': (), 'iptm': (), 'ranking_confidence': ()}
I0905 13:06:56.467109 140453353674560 run_alphafold.py:202] Total JAX model model_1_multimer_v2_pred_0 on VHVL predict time (includes compilation time, see --benchmark): 246.2s
This step takes forever. I checked the CPU usage, memory usage, and the GPU usage and they are:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
35488 dell 20 0 69.9g 4.8g 594148 R 100.0 3.8 1591:11 python /h+
Mem: 128357 6557 1730 106 120069 121081
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:3B:00.0 Off | N/A |
| 30% 33C P2 101W / 320W | 5886MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:5E:00.0 Off | N/A |
| 30% 25C P0 88W / 320W | 0MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:B1:00.0 Off | N/A |
| 30% 25C P0 89W / 320W | 0MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:D9:00.0 Off | N/A |
| 30% 25C P0 94W / 320W | 0MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 35488 C python 1020MiB |
+-----------------------------------------------------------------------------+
The GPU memory is not very high since I saw some people's A100 had a menory usage with over 20000MiB. What's more, the GPU-Util is only 0-1%. I'm not sure whether it's because the graphic driver/CUDA/CUDNN/JAX versions are not matched (driver version: 515.43.04, CUDA version: 11.7, CUDNN version: 8.4.1.50, jaxlib version: 0.3.15+cuda11.cudnn82, python version: 3.8). I didn't see any error log, but it just didn't move on for over 30 hours. I also used 'conda activate alphafold' and tested in python3:
It seems that the CUDA and CUDNN works. So I'm confused and did anyone have this problem before and could you please kindly teach me how to solve it? Thanks a lot for your kind guide.
The text was updated successfully, but these errors were encountered: