Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] error :past_key, past_value = layer_past,how to solve this ? #6522

Open
lovychen opened this issue Sep 11, 2024 · 1 comment
Open
Labels
bug Something isn't working deepspeed-chat Related to DeepSpeed-Chat

Comments

@lovychen
Copy link

lovychen commented Sep 11, 2024

Describe the bug
when i run train,rlhf step 3;

Actor_Lr=9.65e-6
Critic_Lr=5e-6

#--data_path Dahoas/rm-static \
#--offload_reference_model \
deepspeed --master_port 12346 main_step3.py \
   --data_path ${data_path}/beyond/rlhf-reward-single-round-trans_chinese_step3 \
   --data_split 2,4,4 \
   --actor_model_name_or_path $ACTOR_MODEL_PATH \
   --critic_model_name_or_path $CRITIC_MODEL_PATH \
   --data_output_path ${data_path}/train_data_file_step3  \
   --num_padding_at_beginning 1 \ 
   --per_device_generation_batch_size 1 \ 
   --per_device_training_batch_size 1 \ 
   --generation_batches 1 \ 
   --ppo_epochs 1 \ 
   --max_answer_seq_len 256 \
   --max_prompt_seq_len 256 \
   --actor_learning_rate ${Actor_Lr} \
   --critic_learning_rate ${Critic_Lr} \
   --actor_weight_decay 0.1 \
   --critic_weight_decay 0.1 \
   --num_train_epochs 1 \ 
   --lr_scheduler_type cosine \
   --gradient_accumulation_steps 1 \ 
   --actor_gradient_checkpointing \
   --critic_gradient_checkpointing \
   --actor_dropout 0.0 \
   --num_warmup_steps 100 \
   --deepspeed --seed 1234 \
   --enable_hybrid_engine \
   --actor_zero_stage $ACTOR_ZERO_STAGE \
   --critic_zero_stage $CRITIC_ZERO_STAGE \
   --enable_ema \
   --output_dir $output_path \

Log output
i got error:

[rank3]: ValueError: not enough values to unpack (expected 2, got 0)
[rank1]: Traceback (most recent call last):
[rank1]:   File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/main_step3.py", line 673, in <module>
[rank1]:     main()
[rank1]:   File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/main_step3.py", line 527, in main
[rank1]:     out = trainer.generate_experience(batch_prompt['prompt'],
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/rlhf/ppo_trainer.py", line 140, in generate_experience
[rank1]:     seq = self._generate_sequence(prompts, mask, step)
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/rlhf/ppo_trainer.py", line 87, in _generate_sequence
[rank1]:     seq = self.actor_model.module.generate(
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate
[rank1]:     generate_ret_vals = self._generate(*inputs, **kwargs)
[rank1]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/generation/utils.py", line 2024, in generate
[rank1]:     result = self._sample(
[rank1]:              ^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/generation/utils.py", line 2982, in _sample
[rank1]:     outputs = self(**model_inputs, return_dict=True)[rank1]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1609, in _call_impl
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py", line 955, in forward
[rank1]:     transformer_outputs = self.transformer(
[rank1]:                           ^^^^^^^^^^^^^^^^^[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1609, in _call_impl
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py", line 744, in forward
[rank1]:     outputs = block(
[rank1]:               ^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl[rank1]:     return self._call_impl(*args, **kwargs)[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1609, in _call_impl[rank1]:     result = forward_call(*args, **kwargs)[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 171, in forward
[rank1]:     self.attention(input,
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 160, in forward
[rank1]:     context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
[rank1]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[rank1]:   File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 239, in compute_attention           
[rank1]:     past_key, past_value = layer_past
[rank1]:     ^^^^^^^^^^^^^^^^^^^^
[rank1]: ValueError: not enough values to unpack (expected 2, got 0)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

To Reproduce
Steps to reproduce the behavior:

  1. Command/Script to reproduce
  2. What packages are required and their versions
  3. How to run the script
  4. ...

Expected behavior
A clear and concise description of what you expected to happen.

ds_report output

ds_report

[2024-09-11 19:27:52,618] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
gds .................... [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
 [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch']
torch version .................... 2.4.0+cu121
deepspeed install path ........... ['/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed']
deepspeed info ................... 0.15.1, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.4, cuda 12.1
shared memory (/dev/shm) size .... 503.77 GB

Screenshots
If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

 - OS: Ubuntu 20.04.6 LTS
 - GPU :NVIDIA L20*4 46G
 - (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) 0.15.1
 - (if applicable) Hugging Face Transformers/Accelerate/etc. versions 4.44.2
 - Python 3.12.0
 - transformers 4.44.2
 - cuda 12.1
 - torch 2.4.0
 - deepspeed 0.15.1
 - accelerate 0.33.0
 - Any other relevant info about your setup

Docker context
Are you using a specific docker image that you can share?

Additional context

home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/utils/model/model_utils.py:155: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  model_ckpt_state_dict = torch.load(model_ckpt_path, map_location='cpu')

Tasks

No tasks being tracked yet.
@lovychen lovychen added bug Something isn't working deepspeed-chat Related to DeepSpeed-Chat labels Sep 11, 2024
@lovychen lovychen changed the title [BUG] past_key, past_value = layer_past,how to solve the error ? [BUG] error :past_key, past_value = layer_past,how to solve this ? Sep 11, 2024
@lovychen
Copy link
Author

when this line commented out,it works for me;
# --enable_hybrid_engine \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working deepspeed-chat Related to DeepSpeed-Chat
Projects
None yet
Development

No branches or pull requests

1 participant