[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface #693

yeutong · 2024-08-02T18:52:49Z

Describe the bug
The output logits from transformer_lens and huggingface are quite different using Gemma-2-2b-it model

Code example

import torch
import transformer_lens
from transformers import AutoTokenizer, AutoModelForCausalLM

device = 'cuda'
model_name = 'google/gemma-2-2b-it'
tl_model = transformer_lens.HookedTransformer.from_pretrained(model_name, device=device)

tokenizer = AutoTokenizer.from_pretrained(model_name)
hf_model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

inputs = tokenizer('Hello world', return_tensors="pt").to(device)

logits_tl = tl_model(inputs.input_ids, return_type='logits', prepend_bos=False)
logits_hf = hf_model(**inputs).logits

print((logits_tl[0, -1] - logits_hf[0, -1]).mean()) # 0.1159
print((logits_hf[0, -1]).min(), (logits_hf[0, -1]).max()) # -19.6916 16.0789

System Info
transformer_lens 2.3.0, transformers 4.43.2

Additional context
The logit diff is quite large

Checklist

I have checked that there is no similar issue in the repo (required)

neelnanda-io · 2024-08-02T19:10:43Z

TransformerLens centers the unembedding, which translates every logit by a fixed amount per token (the shift can vary over token). Can you do this again for log probs? Or try from_pretrained_no_processing? There are known accuracy issues, but I want to rule out trivial causes

…

On Fri, 2 Aug 2024, 7:53 pm Yeu-Tong Lau, ***@***.***> wrote: *Describe the bug* The output logits from transformer_lens and huggingface are quite different using Gemma-2-2b-it model *Code example* import torchimport transformer_lensfrom transformers import AutoTokenizer, AutoModelForCausalLM device = 'cuda'model_name = 'google/gemma-2-2b-it'tl_model = transformer_lens.HookedTransformer.from_pretrained(model_name, device=device) tokenizer = AutoTokenizer.from_pretrained(model_name)hf_model = AutoModelForCausalLM.from_pretrained(model_name).to(device) inputs = tokenizer('Hello world', return_tensors="pt").to(device) logits_tl = tl_model(inputs.input_ids, return_type='logits', prepend_bos=False)logits_hf = hf_model(**inputs).logits print((logits_tl[0, -1] - logits_hf[0, -1]).mean()) # 0.1159print((logits_hf[0, -1]).min(), (logits_hf[0, -1]).max()) # -19.6916 16.0789 *System Info* transformer_lens 2.3.0, transformers 4.43.2 *Additional context* The logit diff is quite large Checklist - I have checked that there is no similar issue <https://github.com/TransformerLensOrg/TransformerLens/issues> in the repo (*required*) — Reply to this email directly, view it on GitHub <#693>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASRPNKNUTFZ6X5ECWWOOHYDZPPIRNAVCNFSM6AAAAABL5BOXL6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2DKNJXHA4TMNQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

yeutong · 2024-08-02T19:58:57Z

Tried from_pretrained_no_processing and got the same results. It is more than the unembedding centering, the differences exist and get larger in each layer model activations.

def forward_with_cache(model, layer, inputs):
    cache = None
    def hook(module, inputs, outputs):
        nonlocal cache
        cache = inputs[0]
        return outputs
    
    hook_handle = model.model.layers[layer].register_forward_hook(hook)
    _ = model(**inputs)
    hook_handle.remove()

    return cache

resid_pre_diffs = []

for layer in range(tl_model.cfg.n_layers):
    hf_cache = forward_with_cache(hf_model, layer, inputs)
    _, tl_cache = tl_model.run_with_cache(inputs.input_ids, prepend_bos=False, names_filter=[f'blocks.{layer}.hook_resid_pre'])
    tl_cache = tl_cache[f'blocks.{layer}.hook_resid_pre']
    resid_pre_diff = (hf_cache - tl_cache)[0, -1].norm().item()
    resid_pre_diffs.append(resid_pre_diff)

import plotly.express as px
px.line(resid_pre_diffs, markers=True, labels={'index': 'Layer', 'value': 'norm of resid pre diff'}, title='Difference in resid_pre between HF and TL')

mntss · 2024-08-07T08:53:58Z

@yeutong the issue is caused by a different attention scale used (~14.96 vs 16). The HF implementation also disables the attention logits soft capping for inference, but that is less important

for b in tl_model.blocks:
    b.attn.attn_scale = 16
    b.attn.cfg.attn_scores_soft_cap = 0

There is still some difference in the activations, but this is on the order of 5e-4 on the last layer. This one is probably a deeper issue related to the use of einsum in the attention

mntss mentioned this issue Aug 7, 2024

Update Gemma2 attention scale #694

Merged

7 tasks

bryce13950 added complexity-high Very complicated changes for people to address who are quite familiar with the code implementation-inaccuracy Any issues related to our implementation being off from the official version labels Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface #693

[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface #693

yeutong commented Aug 2, 2024

neelnanda-io commented Aug 2, 2024 via email

yeutong commented Aug 2, 2024

mntss commented Aug 7, 2024

[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface #693

[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface #693

Comments

yeutong commented Aug 2, 2024

Checklist

neelnanda-io commented Aug 2, 2024 via email

yeutong commented Aug 2, 2024

mntss commented Aug 7, 2024