Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Major] Update API namings and parameters #78

Merged
merged 10 commits into from
Jan 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tutorials/* linguist-vendored
tests/qa_runbook.ipynb linguist-vendored
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ playground.ipynb
*/*/.cache_dir
results*/
logs
_hidden_local_dev*
tmp_dir_new*/
tmp/
notebooks/figures/
*.tsv
*.csv
Expand Down
5 changes: 5 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ When adding new methods or APIs, unit tests are now enforced. To run existing te
cd pyvene
python -m unittest discover -p '*TestCase.py'
```
For specific test case, yoou can run
```bash
cd pyvene
python -m unittest tests.integration_tests.ComplexInterventionWithGPT2TestCase
```
When checking in new code, please also consider to add new tests in the same PR. Please include test results in the PR to make sure all the existing test cases are passing. Please see the `qa_runbook.ipynb` notebook about a set of conventions about how to add test cases. The code coverage for this repository is currently `low`, and we are adding more automated tests.

#### Format
Expand Down
161 changes: 93 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@

*This is a beta release (public testing).*

# **Use _Activation Intervention_ to Interpret _Causal Mechanism_ of Model**
**pyvene** supports customizable interventions on different neural architectures (e.g., RNN or Transformers). It supports complex intervention schemas (e.g., parallel or serialized interventions) and a wide range of intervention modes (e.g., static or trained interventions) at scale to gain interpretability insights.
# A Library for _Understanding_ and _Improving_ PyTorch Models via Interventions
Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce **pyvene**, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. **pyvene** supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters.

**Getting Started:** [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Basic_Intervention.ipynb) [**_pyvene_ 101**]
**Getting Started:** [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/pyvene/pyvene_101.ipynb) [**Main _pyvene_ 101**]

## Installation
```bash
Expand All @@ -24,55 +24,45 @@ pip install pyvene
## _Wrap_ , _Intervene_ and _Share_
You can intervene with supported models as,
```python
import pyvene
from pyvene import IntervenableRepresentationConfig, IntervenableConfig, IntervenableModel
from pyvene import VanillaIntervention
import torch
import pyvene as pv

# provided wrapper for huggingface gpt2 model
_, tokenizer, gpt2 = pyvene.create_gpt2()
_, tokenizer, gpt2 = pv.create_gpt2()

# turn gpt2 into intervenable_gpt2
intervenable_gpt2 = IntervenableModel(
intervenable_config = IntervenableConfig(
intervenable_representations=[
IntervenableRepresentationConfig(
0, # intervening layer 0
"mlp_output", # intervening mlp output
),
],
intervenable_interventions_type=VanillaIntervention
),
model = gpt2
)
pv_gpt2 = pv.IntervenableModel({
"layer": 0,
"component": "mlp_output",
"source_representation": torch.zeros(
gpt2.config.n_embd)
}, model=gpt2)

# intervene base with sources on the 4th token.
original_outputs, intervened_outputs = intervenable_gpt2(
tokenizer("The capital of Spain is", return_tensors="pt"),
[tokenizer("The capital of Italy is", return_tensors="pt")],
{"sources->base": 4}
orig_outputs, intervened_outputs = pv_gpt2(
base = tokenizer(
"The capital of Spain is",
return_tensors="pt"
),
unit_locations={"base": 3}
)
original_outputs.last_hidden_state - intervened_outputs.last_hidden_state
print(intervened_outputs.last_hidden_state - orig_outputs.last_hidden_state)
```
which returns,

```
tensor([[[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0008, -0.0078, -0.0066, ..., 0.0007, -0.0018, 0.0060]]])
[ 0.0483, -0.1212, -0.2816, ..., 0.1958, 0.0830, 0.0784],
[ 0.0519, 0.2547, -0.1631, ..., 0.0050, -0.0453, -0.1624]]])
```
showing that we have causal effects only on the last token as expected. You can share your interventions through Huggingface with others with a single call,
You can share your interventions through Huggingface with others with a single call,
```python
intervenable_gpt2.save(
pv_gpt2.save(
save_directory="./your_gpt2_mounting_point/",
save_to_hf_hub=True,
hf_repo_name="your_gpt2_mounting_point",
hf_repo_name="your_gpt2_mounting_point"
)
```
We see interventions are knobs that can mount on models. And people can share their knobs with others to share knowledge about how to steer models. You can try this at [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Load_Save_and_Share_Interventions.ipynb) [**Intervention Sharing**]

You can also use the `intervenable_gpt2` just like a regular torch model component inside another model, or another pipeline as,
You can also use the `pv_gpt2` just like a regular torch model component inside another model, or another pipeline as,
```py
import torch
import torch.nn as nn
Expand All @@ -81,7 +71,7 @@ from typing import List, Optional, Tuple, Union, Dict
class ModelWithIntervenables(nn.Module):
def __init__(self):
super(ModelWithIntervenables, self).__init__()
self.intervenable_gpt2 = intervenable_gpt2
self.pv_gpt2 = pv_gpt2
self.relu = nn.ReLU()
self.fc = nn.Linear(768, 1)
# Your other downstream components go here
Expand All @@ -94,33 +84,90 @@ class ModelWithIntervenables(nn.Module):
activations_sources: Optional[Dict] = None,
subspaces: Optional[List] = None,
):
_, counterfactual_x = self.intervenable_gpt2(
_, counterfactual_x = self.pv_gpt2(
base,
sources,
unit_locations,
activations_sources,
subspaces
)
counterfactual_x = counterfactual_x.last_hidden_state

counterfactual_x = self.relu(counterfactual_x)
counterfactual_x = self.fc(counterfactual_x)
return counterfactual_x
return self.fc(self.relu(counterfactual_x.last_hidden_state))
```

## Complex _Intervention Schema_ as an _Object_
One key abstraction that **pyvene** provides is the encapsulation of the intervention schema. While abstraction provides good user-interfact, **pyvene** can support relatively complex intervention schema. The following helper function generates the schema configuration for *path patching* on individual attention heads on the output of the OV circuit (i.e., analyzing causal effect of each individual component):
```py
import pyvene as pv

def path_patching_config(
layer, last_layer,
component="head_attention_value_output", unit="h.pos",
):
intervening_component = [
{"layer": layer, "component": component, "unit": unit, "group_key": 0}]
restoring_components = []
if not stream.startswith("mlp_"):
restoring_components += [
{"layer": layer, "component": "mlp_output", "group_key": 1}]
for i in range(layer+1, last_layer):
restoring_components += [
{"layer": i, "component": "attention_output", "group_key": 1}
{"layer": i, "component": "mlp_output", "group_key": 1}
]
intervenable_config = IntervenableConfig(intervening_component + restoring_components)
return intervenable_config
```
then you can wrap the config generated by this function to a model. And after you have done your intervention, you can share your path patching with others,
```py
_, tokenizer, gpt2 = pv.create_gpt2()

pv_gpt2 = pv.IntervenableModel(
path_patching_config(4, gpt2.config.n_layer),
model=gpt2
)
# saving the path
pv_gpt2.save(
save_directory="./your_gpt2_path/"
)
# loading the path
pv_gpt2 = pv.IntervenableModel.load(
"./tmp/",
model=gpt2)
```


## Selected Tutorials

| **Level** | **Tutorial** | **Run in Colab** | **Description** |
| --- | ------------- | ------------- | ------------- |
| Beginner | [**Getting Started**](tutorials/basic_tutorials/Basic_Intervention.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Basic_Intervention.ipynb) | Introduces basic static intervention on factual recall examples |
| Beginner | [**Intervened Model Generation**](tutorials/advanced_tutorials/Intervened_Model_Generation.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/Intervened_Model_Generation.ipynb) | Shows how to intervene a model during generation |
| Intermediate | [**Intervene Your Local Models**](tutorials/basic_tutorials/Add_New_Model_Type.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Add_New_Model_Type.ipynb) | Illustrates how to run this library with your own models |
| Beginner | [**pyvene 101**](pyvene_101.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/pyvene/pyvene_101.ipynb) | Introduce you to the basics of pyvene |
| Intermediate | [**ROME Causal Tracing**](tutorials/advanced_tutorials/Causal_Tracing.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/Causal_Tracing.ipynb) | Reproduce ROME's Results on Factual Associations with GPT2-XL |
| Intermediate | [**Intervention v.s. Probing**](tutorials/advanced_tutorials/Probing_Gender.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/Probing_Gender.ipynb) | Illustrates how to run trainable interventions and probing with pythia-6.9B |
| Advanced | [**Trainable Interventions for Causal Abstraction**](tutorials/advanced_tutorials/DAS_Main_Introduction.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/DAS_Main_Introduction.ipynb) | Illustrates how to train an intervention to discover causal mechanisms of a neural model |

## Causal Abstraction: From Interventions to Gain Interpretability Insights
## Contributing to This Library
Please see [our guidelines](CONTRIBUTING.md) about how to contribute to this repository.

*Pull requests, bug reports, and all other forms of contribution are welcomed and highly encouraged!* :octocat:

### Other Ways of Installation

**Method 2: Install from the Repo**
```bash
pip install git+https://github.com/stanfordnlp/pyvene.git
```

**Method 3: Clone and Import**
```bash
git clone https://github.com/stanfordnlp/pyvene.git
```
and in parallel folder, import to your project as,
```python
from pyvene import pyvene
_, tokenizer, gpt2 = pyvene.create_gpt2()
```

## A Little Guide for Causal Abstraction: From Interventions to Gain Interpretability Insights
Basic interventions are fun but we cannot make any causal claim systematically. To gain actual interpretability insights, we want to measure the counterfactual behaviors of a model in a data-driven fashion. In other words, if the model responds systematically to your interventions, then you start to associate certain regions in the network with a high-level concept. We also call this alignment search process with model internals.

### Understanding Causal Mechanisms with Static Interventions
Expand Down Expand Up @@ -178,28 +225,6 @@ intervenable.train(
```
where you need to pass in a trainable dataset, and your customized loss and metrics function. The trainable interventions can later be saved on to your disk. You can also use `intervenable.evaluate()` your interventions in terms of customized objectives.

## Contributing to This Library
Please see [our guidelines](CONTRIBUTING.md) about how to contribute to this repository.

*Pull requests, bug reports, and all other forms of contribution are welcomed and highly encouraged!* :octocat:

### Other Ways of Installation

**Method 2: Install from the Repo**
```bash
pip install git+https://github.com/stanfordnlp/pyvene.git
```

**Method 3: Clone and Import**
```bash
git clone https://github.com/stanfordnlp/pyvene.git
```
and in parallel folder, import to your project as,
```python
from pyvene import pyvene
_, tokenizer, gpt2 = pyvene.create_gpt2()
```

## Related Works in Discovering Causal Mechanism of LLMs
If you would like to read more works on this area, here is a list of papers that try to align or discover the causal mechanisms of LLMs.
- [Causal Abstractions of Neural Networks](https://arxiv.org/abs/2106.02997): This paper introduces interchange intervention (a.k.a. activation patching or causal scrubbing). It tries to align a causal model with the model's representations.
Expand Down
4 changes: 3 additions & 1 deletion pyvene/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from .data_generators.causal_model import CausalModel
from .models.intervenable_base import IntervenableModel
from .models.configuration_intervenable_model import IntervenableConfig
from .models.configuration_intervenable_model import IntervenableRepresentationConfig
from .models.configuration_intervenable_model import RepresentationConfig


# Interventions
Expand All @@ -24,6 +24,8 @@
from .models.interventions import ZeroIntervention
from .models.interventions import LocalistRepresentationIntervention
from .models.interventions import DistributedRepresentationIntervention
from .models.interventions import SourcelessIntervention
from .models.interventions import NoiseIntervention


# Utils
Expand Down
26 changes: 26 additions & 0 deletions pyvene/models/basic_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ def get_list_depth(lst):
return 1 + max((get_list_depth(item) for item in lst), default=0)
return 0


def get_batch_size(model_input):
"""
Get batch size based on the input
Expand All @@ -141,3 +142,28 @@ def get_batch_size(model_input):
batch_size = v.shape[0]
break
return batch_size


def GET_LOC(
LOC,
unit="h.pos",
batch_size=1,
):
"""
From simple locale to nested one.
"""
if unit == "h.pos":
return [
[
[
[LOC[0]]
] * batch_size,
[
[LOC[1]]
] * batch_size
]
]
else:
raise NotImplementedError(
f"{unit} is not supported."
)
Loading