Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/acces single heads #40

Merged
merged 13 commits into from
Apr 11, 2024
Merged

Feature/acces single heads #40

merged 13 commits into from
Apr 11, 2024

Conversation

llinauer
Copy link
Collaborator

@llinauer llinauer commented Apr 7, 2024

obvs/patchscope.py Outdated Show resolved Hide resolved

# currently, accessing single head activations is only supported for GPT2LMHead models
if (self.source.head is not None and 'gpt2' not in self.source.model_name or
self.target.head is not None and 'gpt2' not in self.target.model_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with other GPT models (e.g., GPTJ)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not.
GPT-J, despite being similar, uses a different attention implementation (GPTJAttention: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gptj/modeling_gptj.py#L100)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to implement a mechanism that works for a range of model architectures

Copy link
Collaborator

@tvhong tvhong Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good news is that we're starting to see a pattern emerge.

I'm thinking we want to have a base ModelAccessor class that looks like:

class ModelAccessor(ABC):
  def get_block_output(position: list[int], layer: int) -> Tensor:
    raise NotImplementedError(...)

  def set_block_output(position: list[int], layer: int) -> None:
    raise NotImplementedError(...)

  def get_head_attn(position: list[int], layer: int, head: list[int]) -> Tensor:
    raise NotImplementedError(...)

  def set_head_attn(position: list[int], layer: int, head: list[int]) -> None:
    raise NotImplementedError(...)

and each model can implement this class.

obvs/patchscope.py Outdated Show resolved Hide resolved
obvs/patchscope.py Show resolved Hide resolved
if self.source.head is not None:
attn = getattr(layer, self.ATTN_SOURCE)
# TODO may not be .input for other models
head_act = getattr(attn, self.HEAD_SOURCE).input[0][0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using input instead of output?

My understanding is that patchscope always uses output, and if a researcher needs an input from layer i, they can access the output from layer i-1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, that the output of the c_attn layer in GPT2Attention is not the same as the input of the c_proj.
c_attn.output gets us the Q,K & Values concatenated together into one tensor. We want the attention layer outputs (sometimes referred to as z-values), which are calculated inbetween the c_attn and c_proj forward calls in the GPT2Attention object. So they are input of c_proj, but not output of c_attn
See GPT2Attention.forward for reference (https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L306)

Copy link
Collaborator

@tvhong tvhong Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.. see. So, c_proj is the equivalence of W^O in the original transformer paper?

If so, I agree that the concatenated head would be at .attn.c_proj.input.

image

https://arxiv.org/pdf/1706.03762.pdf

obvs/patchscope.py Outdated Show resolved Hide resolved
obvs/patchscope.py Outdated Show resolved Hide resolved
@tvhong
Copy link
Collaborator

tvhong commented Apr 11, 2024

I think this PR is logically sound.

But do you want to merge #41 first to fix CI before merging in this PR?

@tvhong tvhong merged commit 2d4a407 into main Apr 11, 2024
2 of 3 checks passed
@tvhong
Copy link
Collaborator

tvhong commented Apr 11, 2024

You know what, let's merge this. I'll rebase the other one.

@llinauer llinauer deleted the feature/acces_single_heads branch April 11, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants