Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如果想绕过deepspeed做finetune,可以在train的时候直接model.step()来实现吗? #172

Open
cocoshe opened this issue Feb 16, 2024 · 2 comments

Comments

@cocoshe
Copy link

cocoshe commented Feb 16, 2024

image

或者有什么办法(或者需要注意修改哪些地方),才能实现解开对deepspeed的依赖呢?

@1049451037
Copy link
Member

如果不需要模型并行、zero优化器等技术,sat构造出来的model就可以当作一个正常的pytorch module来用。

from sat import AutoModel
model, args = AutoModel.from_pretrained("bert-base-uncased")
model = model.cuda()
inputs = {'input_ids': torch.LongTensor([[1, 2, 3]]).cuda(), 'position_ids': torch.LongTensor([[0, 1, 2]]).cuda(), 'token_type_ids': torch.LongTensor([[0, 0, 0]]).cuda(), 'attention_mask': torch.LongTensor([[[[1]]]]).cuda()}
output = model(**inputs)[0]
loss = output.sum()
loss.backward()
print(loss)

@corkiyao
Copy link

corkiyao commented Sep 9, 2024

如果不需要模型并行、zero优化器等技术,sat构造出来的model就可以当作一个正常的pytorch module来用。

from sat import AutoModel
model, args = AutoModel.from_pretrained("bert-base-uncased")
model = model.cuda()
inputs = {'input_ids': torch.LongTensor([[1, 2, 3]]).cuda(), 'position_ids': torch.LongTensor([[0, 1, 2]]).cuda(), 'token_type_ids': torch.LongTensor([[0, 0, 0]]).cuda(), 'attention_mask': torch.LongTensor([[[[1]]]]).cuda()}
output = model(**inputs)[0]
loss = output.sum()
loss.backward()
print(loss)

不好意思,想请教下。我想把模型参数放到不同的GPU上,比如第一层放在GPU0,另外一层放在GPU1,这个SAT实现了吗?显卡的内存不够,需要分块处理

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants