Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DeepSpeed errors when running BLOOM #67

Open
jataylo opened this issue Jun 30, 2023 · 1 comment
Open

[BUG] DeepSpeed errors when running BLOOM #67

jataylo opened this issue Jun 30, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@jataylo
Copy link

jataylo commented Jun 30, 2023

Describe the bug
I am facing issues getting the BLOOM model to run with DeepSpeed using TOT upstream pytorch.

The first slough of errors observed are resolved with @rraminen's workaround in the transformer_inference branch.

This occurs both in 5.4.2 and 5.5.

Log snippet:
deepspeed_error.txt

/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
    4 | #error C++17 or later compatible compiler is required to use ATen.
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/core/ivalue_inl.h: In lambda function:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/core/ivalue_inl.h:1061:30: error: ‘is_convertible_v’ is not a member of ‘std’; did you mean ‘is_convertible’?
1061 |         if constexpr (::std::is_convertible_v<typename c10::invoke_result_t<T &&, Future&>, IValueWithStorages>) {

To Reproduce
Docker image: rocm/pytorch-private:BLOOM_DeepSpeed_tranformer_inference_enabled_tot_issue

Steps to reproduce the behavior:

  1. Build upstream PyTorch and the transformer_inference ROCm DeepSpeed branch
  2. git clone https://github.com/huggingface/transformers-bloom-inference
  3. deepspeed --num_gpus 1 transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py --name bigscience/bloom-560m

ds_report output
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.8/site-packages/torch']
torch version .................... 2.1.0a0+gitfde024b
deepspeed install path ........... ['/opt/conda/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.9.3+44c0bbfe, 44c0bbf, transformer_inference
torch cuda version ............... None
torch hip version ................ 5.5.30201-c1741e9b
nvcc version ..................... None
deepspeed wheel compiled w. ...... torch 2.0, hip 5.5

@jataylo jataylo added the bug Something isn't working label Jun 30, 2023
@jataylo jataylo changed the title [BUG] DeepSpeed compilation errors with BLOOM [BUG] DeepSpeed errors when running BLOOM Jun 30, 2023
@jataylo
Copy link
Author

jataylo commented Jun 30, 2023

cc: @jithunnair-amd @dllehr-amd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants