Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no_sync context manager is incompatible with gradient partitioning logic of ZeRO stage 3 #6194

Open
1 task done
ioir123ju opened this issue Nov 29, 2024 · 2 comments
Open
1 task done
Labels
pending This problem is yet to be addressed

Comments

@ioir123ju
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
  • Python version: 3.10.15
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA L20
  • DeepSpeed version: 0.16.0

Reproduction

FORCE_TORCHRUN=1 llamafactory-cli train examples/qwen2_vl_full_sft.yaml
image

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 29, 2024
@khazic

This comment was marked as resolved.

@hiyouga
Copy link
Owner

hiyouga commented Nov 29, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

3 participants