-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load GPT-J from HF #39
base: master
Are you sure you want to change the base?
load GPT-J from HF #39
Conversation
@@ -89,6 +91,21 @@ def __init__(self, config, device=None): | |||
**attn_config, | |||
) | |||
|
|||
#check weights contiguous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why we need to check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weights for GPT-J's attention layers are not contiguous, which will raise Tensors must be contiguous
in deepspeed.
I find a similar issue here
Hi, thx again for another PR. It would be nice to use the official HF version for MAGMA. However, last time we tried to implement this we noticed a slight difference in model outputs which we could not really get to the bottom of. I'm very careful with these kinds of changes, so it would be great if you could compare the logits for some example inputs before/after your change. Best, Constantin |
Hi Constantin, Thank you for reading my changes. Regarding the slight difference, I think it is hard for me to explain something from the example inputs/outputs right now. However, a change in the structure of the model may be the cause. before:
at present (without adding adapters)
GPT-J:
Hope this helps you! Best, Changxu |
Hmm this puzzles me a bit, but in any case, unless consistent behavior with the old version is ensured (e.g. by checking that all the hidden states are the same for a couple of example inputs) I will not merge these changes. Let me know if you manage to do it and thx for the effort. Best, Constantin |
Hello, did you manage to make this work? |
Hi, |
Hi, thank you for the fast answer! |
The data I use for fine-tuning is in a completely different domain so I'm afraid my checkpoint can't meet your needs right now. |
ah, ok, thank you anyway |
load the GPT-J checkpoint newer version of transformers