[BUG]: ModuleNotFoundError: No module named 'colossalai.context.parallel_mode' #4980

vetmax7 · 2023-10-26T09:04:18Z

🐛 Describe the bug

Hello !

I tried to run train.sh for FastFold https://github.com/hpcaitech/FastFold, but I got such errors:

Could you help me pls?
colossalai 0.3.3 pypi_0 pypi

/opt/conda/envs/pytorch/lib/python3.8/site-packages/colossalai/kernel/cuda_native/mha/flash_attn_2.py:21: UserWarning: FlashAttention only supports Ampere GPUs or newer.                                     warnings.warn("FlashAttention only supports Ampere GPUs or newer.")
/opt/conda/envs/pytorch/lib/python3.8/site-packages/colossalai/kernel/cuda_native/mha/flash_attn_2.py:28: UserWarning: please install flash_attn from https://github.com/HazyResearch/flash-attention
  warnings.warn("please install flash_attn from https://github.com/HazyResearch/flash-attention")
Traceback (most recent call last):
  File "train.py", line 11, in <module>
    from fastfold.utils.inject_fastnn import inject_fastnn
  File "/FastFold/fastfold/utils/inject_fastnn.py", line 17, in <module>
    from fastfold.model.fastnn import EvoformerStack, ExtraMSAStack
  File "/FastFold/fastfold/model/fastnn/__init__.py", line 1, in <module>
    from .msa import MSACore, ExtraMSACore, ExtraMSABlock, ExtraMSAStack
  File "/FastFold/fastfold/model/fastnn/msa.py", line 21, in <module>
    from colossalai.context.parallel_mode import ParallelMode
ModuleNotFoundError: No module named 'colossalai.context.parallel_mode'


ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 144182) of binary: /opt/conda/envs/pytorch/bin/python
Traceback (most recent call last):
  File "/opt/conda/envs/pytorch/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.12.0', 'console_scripts', 'torchrun')())
  File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/envs/pytorch/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

============================================================
train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-10-26_16:44:54
host : volta01.hpc.local
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 144182)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Environment

No response

The text was updated successfully, but these errors were encountered:

Fridge003 · 2023-10-27T02:21:51Z

Hi, colossalai.context.parallel_model has been deprecated and moved to legacy in the newest version of ColossalAI (it can be imported through colossalai.legacy.context.parallel_model). Downgrading ColossalAI to older version (below 0.3.0) should solve this issue.

vetmax7 added the bug Something isn't working label Oct 26, 2023

Fridge003 closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: ModuleNotFoundError: No module named 'colossalai.context.parallel_mode' #4980

[BUG]: ModuleNotFoundError: No module named 'colossalai.context.parallel_mode' #4980

vetmax7 commented Oct 26, 2023

Fridge003 commented Oct 27, 2023 •

edited

Loading

[BUG]: ModuleNotFoundError: No module named 'colossalai.context.parallel_mode' #4980

[BUG]: ModuleNotFoundError: No module named 'colossalai.context.parallel_mode' #4980

Comments

vetmax7 commented Oct 26, 2023

🐛 Describe the bug

============================================================ train.py FAILED

Failures: <NO_OTHER_FAILURES>

Environment

Fridge003 commented Oct 27, 2023 • edited Loading

============================================================
train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Fridge003 commented Oct 27, 2023 •

edited

Loading