Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving Unsloth Model to Regular Huggingface Format #1410

Open
hojmax opened this issue Dec 10, 2024 · 2 comments
Open

Saving Unsloth Model to Regular Huggingface Format #1410

hojmax opened this issue Dec 10, 2024 · 2 comments

Comments

@hojmax
Copy link

hojmax commented Dec 10, 2024

I am trying to finetune a model with unsloth, and evaluate the finetuned model with lm_eval. However, when I save the unsloth model after having trained it, loading the checkpoint breaks lm_eval. Minimal reproducable example (without the finetuning just loading and saving):

from lm_eval import simple_evaluate
from unsloth import FastLanguageModel

# Load with unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="../base_llama_3b_instruct",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=False,
)

# Save the model and tokenizer
checkpoint_path = "other_base_llama_3b_instruct"
model.save_pretrained(checkpoint_path)
tokenizer.save_pretrained(checkpoint_path)

# Evaluate the model
results = simple_evaluate(
    model="hf",
    model_args=f"pretrained={checkpoint_path}",
    tasks=["tinyMMLU"],
)

This gives:

root@9adce86cc0bf:/app/evaluation# python temp.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.4: Fast Llama patching. Transformers:4.46.2.
   \\   /|    GPU: NVIDIA A10G. Max memory: 21.975 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2024-12-10:16:11:37,332 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2024-12-10:16:11:37,332 INFO     [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'other_base_llama_3b_instruct'}
2024-12-10:16:11:37,370 INFO     [huggingface.py:129] Using device 'cuda'
2024-12-10:16:11:37,371 INFO     [huggingface.py:481] Using model type 'default'
2024-12-10:16:11:37,895 INFO     [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2024-12-10:16:11:47,707 INFO     [task.py:415] Building contexts for tinyMMLU on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3292.65it/s]
2024-12-10:16:11:47,744 INFO     [evaluator.py:489] Running loglikelihood requests
Running loglikelihood requests:   0%|                                                                                                                                | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/app/evaluation/temp.py", line 18, in <module>
    results = simple_evaluate(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 301, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 500, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 378, in loglikelihood
    return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 1119, in _loglikelihood_tokens
    self._model_call(batched_inps, **call_kwargs), dim=-1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 832, in _model_call
    return self.model(inps).logits
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 969, in _CausalLM_fast_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 832, in LlamaModel_fast_forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 493, in LlamaDecoderLayer_fast_forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 362, in LlamaAttention_fast_forward
    Q, K, V = self.apply_qkv(self, hidden_states)
              ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1931, in __getattr__
    raise AttributeError(
AttributeError: 'LlamaSdpaAttention' object has no attribute 'apply_qkv'
Running loglikelihood requests:   0%| 

I also tried using model.save_pretrained_merged() without any luck. How should I do this?

@dame-cell
Copy link
Contributor

this is interesting there could be a few possible errors here

  • maybe lm eval install the new version of transformers not the one unsloth uses
  • I have not really updating myself with the transformers code but it seems that maybe they updated the llama file to now use this
    looking at the unsloth code they used apply_qkv to get the Q, K, V but huggingface does not have apply_qkv anymore
   bsz, q_len, _ = hidden_states.size()

        query_states = self.q_proj(hidden_states)
        key_states = self.k_proj(hidden_states)
        value_states = self.v_proj(hidden_states)

        # use -1 to infer num_heads and num_key_value_heads as they may vary if tensor parallel is used
        query_states = query_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)
        key_states = key_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)
        value_states = value_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)

first try to check if the transformers version you are using matches the unsloth version
if this does not work I guess we might have to modify the model code again

@danielhanchen
Copy link
Contributor

Apologies on the delay - the issue is Unsloth patches over transformers - so best to reload transformers if you've finished with Unsloth

ie

import transformers
import importlib
importlib.reload(transformers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants