You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you help me pls?
colossalai 0.3.3 pypi_0 pypi
/opt/conda/envs/pytorch/lib/python3.8/site-packages/colossalai/kernel/cuda_native/mha/flash_attn_2.py:21: UserWarning: FlashAttention only supports Ampere GPUs or newer. warnings.warn("FlashAttention only supports Ampere GPUs or newer.")
/opt/conda/envs/pytorch/lib/python3.8/site-packages/colossalai/kernel/cuda_native/mha/flash_attn_2.py:28: UserWarning: please install flash_attn from https://github.com/HazyResearch/flash-attention
warnings.warn("please install flash_attn from https://github.com/HazyResearch/flash-attention")
Traceback (most recent call last):
File "train.py", line 11, in <module>
from fastfold.utils.inject_fastnn import inject_fastnn
File "/FastFold/fastfold/utils/inject_fastnn.py", line 17, in <module>
from fastfold.model.fastnn import EvoformerStack, ExtraMSAStack
File "/FastFold/fastfold/model/fastnn/__init__.py", line 1, in <module>
from .msa import MSACore, ExtraMSACore, ExtraMSABlock, ExtraMSAStack
File "/FastFold/fastfold/model/fastnn/msa.py", line 21, in <module>
from colossalai.context.parallel_mode import ParallelMode
ModuleNotFoundError: No module named 'colossalai.context.parallel_mode'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 144182) of binary: /opt/conda/envs/pytorch/bin/python
Traceback (most recent call last):
File "/opt/conda/envs/pytorch/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.12.0', 'console_scripts', 'torchrun')())
File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Hi, colossalai.context.parallel_model has been deprecated and moved to legacy in the newest version of ColossalAI (it can be imported through colossalai.legacy.context.parallel_model). Downgrading ColossalAI to older version (below 0.3.0) should solve this issue.
🐛 Describe the bug
Hello !
I tried to run train.sh for FastFold https://github.com/hpcaitech/FastFold, but I got such errors:
Could you help me pls?
colossalai 0.3.3 pypi_0 pypi
============================================================
train.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2023-10-26_16:44:54
host : volta01.hpc.local
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 144182)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Environment
No response
The text was updated successfully, but these errors were encountered: