-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/acces single heads #40
Conversation
…-style transformers
…_pass with single head access
…ss single-head activations
|
||
# currently, accessing single head activations is only supported for GPT2LMHead models | ||
if (self.source.head is not None and 'gpt2' not in self.source.model_name or | ||
self.target.head is not None and 'gpt2' not in self.target.model_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with other GPT models (e.g., GPTJ)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not.
GPT-J, despite being similar, uses a different attention implementation (GPTJAttention: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gptj/modeling_gptj.py#L100)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to implement a mechanism that works for a range of model architectures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good news is that we're starting to see a pattern emerge.
I'm thinking we want to have a base ModelAccessor
class that looks like:
class ModelAccessor(ABC):
def get_block_output(position: list[int], layer: int) -> Tensor:
raise NotImplementedError(...)
def set_block_output(position: list[int], layer: int) -> None:
raise NotImplementedError(...)
def get_head_attn(position: list[int], layer: int, head: list[int]) -> Tensor:
raise NotImplementedError(...)
def set_head_attn(position: list[int], layer: int, head: list[int]) -> None:
raise NotImplementedError(...)
and each model can implement this class.
obvs/patchscope.py
Outdated
if self.source.head is not None: | ||
attn = getattr(layer, self.ATTN_SOURCE) | ||
# TODO may not be .input for other models | ||
head_act = getattr(attn, self.HEAD_SOURCE).input[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using input
instead of output
?
My understanding is that patchscope
always uses output
, and if a researcher needs an input from layer i
, they can access the output from layer i-1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is, that the output of the c_attn layer in GPT2Attention is not the same as the input of the c_proj.
c_attn.output gets us the Q,K & Values concatenated together into one tensor. We want the attention layer outputs (sometimes referred to as z-values), which are calculated inbetween the c_attn and c_proj forward calls in the GPT2Attention object. So they are input of c_proj, but not output of c_attn
See GPT2Attention.forward for reference (https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L306)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.. see. So, c_proj
is the equivalence of W^O
in the original transformer paper?
If so, I agree that the concatenated head would be at .attn.c_proj.input
.
I think this PR is logically sound. But do you want to merge #41 first to fix CI before merging in this PR? |
You know what, let's merge this. I'll rebase the other one. |
Add accessing single-head activations in source_forward_pass and target_forward_pass methods of Patchscope
Warning This currently only works for models with the GPT2Attention implementation (https://github.com/huggingface/transformers/blob/v4.39.2/src/transformers/models/gpt2/modeling_gpt2.py#L123)
We probably need one implementation per attention architecture
Modify activation_patching_ioi.py to create a plot by layer & head