Saving Unsloth Model to Regular Huggingface Format #1410

hojmax · 2024-12-10T16:14:45Z

I am trying to finetune a model with unsloth, and evaluate the finetuned model with lm_eval. However, when I save the unsloth model after having trained it, loading the checkpoint breaks lm_eval. Minimal reproducable example (without the finetuning just loading and saving):

from lm_eval import simple_evaluate
from unsloth import FastLanguageModel

# Load with unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="../base_llama_3b_instruct",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=False,
)

# Save the model and tokenizer
checkpoint_path = "other_base_llama_3b_instruct"
model.save_pretrained(checkpoint_path)
tokenizer.save_pretrained(checkpoint_path)

# Evaluate the model
results = simple_evaluate(
    model="hf",
    model_args=f"pretrained={checkpoint_path}",
    tasks=["tinyMMLU"],
)

This gives:

root@9adce86cc0bf:/app/evaluation# python temp.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.4: Fast Llama patching. Transformers:4.46.2.
   \\   /|    GPU: NVIDIA A10G. Max memory: 21.975 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2024-12-10:16:11:37,332 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2024-12-10:16:11:37,332 INFO     [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'other_base_llama_3b_instruct'}
2024-12-10:16:11:37,370 INFO     [huggingface.py:129] Using device 'cuda'
2024-12-10:16:11:37,371 INFO     [huggingface.py:481] Using model type 'default'
2024-12-10:16:11:37,895 INFO     [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2024-12-10:16:11:47,707 INFO     [task.py:415] Building contexts for tinyMMLU on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3292.65it/s]
2024-12-10:16:11:47,744 INFO     [evaluator.py:489] Running loglikelihood requests
Running loglikelihood requests:   0%|                                                                                                                                | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/app/evaluation/temp.py", line 18, in <module>
    results = simple_evaluate(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 301, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 500, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 378, in loglikelihood
    return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 1119, in _loglikelihood_tokens
    self._model_call(batched_inps, **call_kwargs), dim=-1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 832, in _model_call
    return self.model(inps).logits
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 969, in _CausalLM_fast_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 832, in LlamaModel_fast_forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 493, in LlamaDecoderLayer_fast_forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 362, in LlamaAttention_fast_forward
    Q, K, V = self.apply_qkv(self, hidden_states)
              ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1931, in __getattr__
    raise AttributeError(
AttributeError: 'LlamaSdpaAttention' object has no attribute 'apply_qkv'
Running loglikelihood requests:   0%|

I also tried using model.save_pretrained_merged() without any luck. How should I do this?

The text was updated successfully, but these errors were encountered:

dame-cell · 2024-12-11T15:04:53Z

this is interesting there could be a few possible errors here

maybe lm eval install the new version of transformers not the one unsloth uses
I have not really updating myself with the transformers code but it seems that maybe they updated the llama file to now use this
looking at the unsloth code they used apply_qkv to get the Q, K, V but huggingface does not have apply_qkv anymore

   bsz, q_len, _ = hidden_states.size()

        query_states = self.q_proj(hidden_states)
        key_states = self.k_proj(hidden_states)
        value_states = self.v_proj(hidden_states)

        # use -1 to infer num_heads and num_key_value_heads as they may vary if tensor parallel is used
        query_states = query_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)
        key_states = key_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)
        value_states = value_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)

first try to check if the transformers version you are using matches the unsloth version
if this does not work I guess we might have to modify the model code again

danielhanchen · 2024-12-12T09:18:41Z

Apologies on the delay - the issue is Unsloth patches over transformers - so best to reload transformers if you've finished with Unsloth

ie

import transformers
import importlib
importlib.reload(transformers)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving Unsloth Model to Regular Huggingface Format #1410

Saving Unsloth Model to Regular Huggingface Format #1410

hojmax commented Dec 10, 2024

dame-cell commented Dec 11, 2024

danielhanchen commented Dec 12, 2024

Saving Unsloth Model to Regular Huggingface Format #1410

Saving Unsloth Model to Regular Huggingface Format #1410

Comments

hojmax commented Dec 10, 2024

dame-cell commented Dec 11, 2024

danielhanchen commented Dec 12, 2024