You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to finetune a model with unsloth, and evaluate the finetuned model with lm_eval. However, when I save the unsloth model after having trained it, loading the checkpoint breaks lm_eval. Minimal reproducable example (without the finetuning just loading and saving):
fromlm_evalimportsimple_evaluatefromunslothimportFastLanguageModel# Load with unslothmodel, tokenizer=FastLanguageModel.from_pretrained(
model_name="../base_llama_3b_instruct",
max_seq_length=2048,
dtype=None,
load_in_4bit=False,
)
# Save the model and tokenizercheckpoint_path="other_base_llama_3b_instruct"model.save_pretrained(checkpoint_path)
tokenizer.save_pretrained(checkpoint_path)
# Evaluate the modelresults=simple_evaluate(
model="hf",
model_args=f"pretrained={checkpoint_path}",
tasks=["tinyMMLU"],
)
This gives:
root@9adce86cc0bf:/app/evaluation# python temp.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2024.12.4: Fast Llama patching. Transformers:4.46.2.
\\ /| GPU: NVIDIA A10G. Max memory: 21.975 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2024-12-10:16:11:37,332 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2024-12-10:16:11:37,332 INFO [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'other_base_llama_3b_instruct'}
2024-12-10:16:11:37,370 INFO [huggingface.py:129] Using device 'cuda'
2024-12-10:16:11:37,371 INFO [huggingface.py:481] Using model type 'default'
2024-12-10:16:11:37,895 INFO [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2024-12-10:16:11:47,707 INFO [task.py:415] Building contexts for tinyMMLU on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3292.65it/s]
2024-12-10:16:11:47,744 INFO [evaluator.py:489] Running loglikelihood requests
Running loglikelihood requests: 0%| | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
File "/app/evaluation/temp.py", line 18, in <module>
results = simple_evaluate(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 301, in simple_evaluate
results = evaluate(
^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 500, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 378, in loglikelihood
return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 1119, in _loglikelihood_tokens
self._model_call(batched_inps, **call_kwargs), dim=-1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 832, in _model_call
return self.model(inps).logits
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 969, in _CausalLM_fast_forward
outputs = self.model(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 832, in LlamaModel_fast_forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 493, in LlamaDecoderLayer_fast_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 362, in LlamaAttention_fast_forward
Q, K, V = self.apply_qkv(self, hidden_states)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1931, in __getattr__
raise AttributeError(
AttributeError: 'LlamaSdpaAttention' object has no attribute 'apply_qkv'
Running loglikelihood requests: 0%|
I also tried using model.save_pretrained_merged() without any luck. How should I do this?
The text was updated successfully, but these errors were encountered:
this is interesting there could be a few possible errors here
maybe lm eval install the new version of transformers not the one unsloth uses
I have not really updating myself with the transformers code but it seems that maybe they updated the llama file to now use this
looking at the unsloth code they used apply_qkv to get the Q, K, V but huggingface does not have apply_qkv anymore
bsz, q_len, _=hidden_states.size()
query_states=self.q_proj(hidden_states)
key_states=self.k_proj(hidden_states)
value_states=self.v_proj(hidden_states)
# use -1 to infer num_heads and num_key_value_heads as they may vary if tensor parallel is usedquery_states=query_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)
key_states=key_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)
value_states=value_states.view(bsz, q_len, -1, self.head_dim).transpose(1, 2)
first try to check if the transformers version you are using matches the unsloth version
if this does not work I guess we might have to modify the model code again
I am trying to finetune a model with unsloth, and evaluate the finetuned model with lm_eval. However, when I save the unsloth model after having trained it, loading the checkpoint breaks lm_eval. Minimal reproducable example (without the finetuning just loading and saving):
This gives:
I also tried using
model.save_pretrained_merged()
without any luck. How should I do this?The text was updated successfully, but these errors were encountered: