-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformers support generation, trainer, tutorial, etc. #748
base: master
Are you sure you want to change the base?
Conversation
python finetune_in_native_mindspore.py \ | ||
--model_path meta-llama/Meta-Llama-3-8B \ | ||
--dataset_path Yelp/yelp_review_full \ | ||
\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
nn.AvgPool1d, | ||
nn.AvgPool2d, | ||
nn.AvgPool3d, | ||
nn.CrossEntropyLoss, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this hard-coded fp32 layers may not fit for all models
return out | ||
|
||
|
||
class FlashAttention2(nn.Cell): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may illustrate it's just a wrapper, not the real FA2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i heard before that it is really fa2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using mint adamw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will switch the mint
uniformly later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary for pynative?
mindone/transformers/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can list the supported features compared to torch? For example, beam_search
for generation, 8-bit quantization for memory reduction, are quite commonly used, but seems to be missing in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently only the most basic sample
method for generation is supported to provide for MLLMs to use, and do not support any quantification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the full interface will be provided in the subsequent version 4.46.2, including beam_search
for generation
mindone/transformers/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better claim which ms version and mode (graph/pynative) are mainly tested. If both graph and pynative mode are supported, do both of them guarantee good accuracy? or just both runnable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the he newly added features were validated on ms2.3.1, but some existing interfaces may not be supported, this will require complete validation before providing a confirmed support version to the public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for infer/generate, it can have good accuracy, and for training, it is currently only in a runnable state.
What does this PR do?
Adds # (feature)
Before submitting
What's New
. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx