Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Sequence Parallelism #6506

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

add Sequence Parallelism #6506

wants to merge 24 commits into from

Conversation

HaoshengZou
Copy link

@HaoshengZou HaoshengZou commented Jan 2, 2025

What does this PR do?

add Sequence Parallelism (#4733 #5024 #5207 #5815 #5841 etc.)
direct plug&play use at https://github.com/Qihoo360/360-LLaMA-Factory

We have a separate README and chat-group at https://github.com/Qihoo360/360-LLaMA-Factory, only for Sequence Parallelism part. They are not to be merged.
We developed based on LLaMA-Factory's latest release v0.9.1. We also based on https://github.com/zhuzilin/ring-flash-attention. The original repos are fully acknowledged.
We developed this at 360. I am PhD from Tsinghua-CS Prof. Jun Zhu's group.

Feel free to review and comment on changes as you see fit. We'll make it better.
Thank you!

Before submitting

Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Haosheng, thanks a lot for your contribution, we have left some comments, could you kindly revise the code according them?

requirements.txt Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
src/llamafactory/train/sft/workflow.py Outdated Show resolved Hide resolved
src/llamafactory/train/sft/trainer.py Show resolved Hide resolved
@hiyouga hiyouga added the pending This problem is yet to be addressed label Jan 8, 2025
@hiyouga hiyouga self-requested a review January 10, 2025 06:16
@LiuLinyun
Copy link

我看里面的 seq padding 是直接 pad 到 cutoff_len, 不知道我对这个提交的理解是否有偏差。若样本长度普遍偏短,是否会出现计算浪费?
I see it is padded to cutoff_len, not max len of a micro batch, whether am I misunderstood. If most samples are shoter then cutoff_len, there are a lot of computation waste.

@HaoshengZou
Copy link
Author

我看里面的 seq padding 是直接 pad 到 cutoff_len, 不知道我对这个提交的理解是否有偏差。若样本长度普遍偏短,是否会出现计算浪费? I see it is padded to cutoff_len, not max len of a micro batch, whether am I misunderstood. If most samples are shoter then cutoff_len, there are a lot of computation waste.

packing would be recommended (and is widely used) for such SFT scenarios.

@hiyouga
Copy link
Owner

hiyouga commented Jan 17, 2025

Hi Haosheng, sorry for the delay in our processing. Recently, we are busy in work and it's quite difficult to merge it. The expected time of finish is before Jan 31st.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants