stanfordnlp · frankaging · Jan 21, 2024 · Jan 20, 2024 · Jan 20, 2024 · Jan 20, 2024
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+tutorials/* linguist-vendored
+tests/qa_runbook.ipynb linguist-vendored
diff --git a/.gitignore b/.gitignore
@@ -7,6 +7,9 @@ playground.ipynb
 */*/.cache_dir
 results*/
 logs
+_hidden_local_dev*
+tmp_dir_new*/
+tmp/
 notebooks/figures/
 *.tsv
 *.csv

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -25,6 +25,11 @@ When adding new methods or APIs, unit tests are now enforced. To run existing te
 cd pyvene
 python -m unittest discover -p '*TestCase.py'
 ```
+For specific test case, yoou can run
+```bash
+cd pyvene
+python -m unittest tests.integration_tests.ComplexInterventionWithGPT2TestCase
+```
 When checking in new code, please also consider to add new tests in the same PR. Please include test results in the PR to make sure all the existing test cases are passing. Please see the `qa_runbook.ipynb` notebook about a set of conventions about how to add test cases. The code coverage for this repository is currently `low`, and we are adding more automated tests.
 
 #### Format

diff --git a/README.md b/README.md
@@ -11,10 +11,10 @@
 
 *This is a beta release (public testing).*
 
-# **Use _Activation Intervention_ to Interpret _Causal Mechanism_ of Model**
-**pyvene** supports customizable interventions on different neural architectures (e.g., RNN or Transformers). It supports complex intervention schemas (e.g., parallel or serialized interventions) and a wide range of intervention modes (e.g., static or trained interventions) at scale to gain interpretability insights.
+# A Library for _Understanding_ and _Improving_ PyTorch Models via Interventions
+Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce **pyvene**, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. **pyvene** supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters.
 
-**Getting Started:** [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Basic_Intervention.ipynb) [**_pyvene_ 101**] 
+**Getting Started:** [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/pyvene/pyvene_101.ipynb) [**Main _pyvene_ 101**] 
 
 ## Installation
 ```bash
@@ -24,55 +24,45 @@ pip install pyvene
 ## _Wrap_ , _Intervene_ and _Share_
 You can intervene with supported models as,
 ```python
-import pyvene
-from pyvene import IntervenableRepresentationConfig, IntervenableConfig, IntervenableModel
-from pyvene import VanillaIntervention
+import torch
+import pyvene as pv
 
-# provided wrapper for huggingface gpt2 model
-_, tokenizer, gpt2 = pyvene.create_gpt2()
+_, tokenizer, gpt2 = pv.create_gpt2()
 
-# turn gpt2 into intervenable_gpt2
-intervenable_gpt2 = IntervenableModel(
- intervenable_config = IntervenableConfig(
- intervenable_representations=[
- IntervenableRepresentationConfig(
- 0, # intervening layer 0
- "mlp_output", # intervening mlp output
- ),
- ],
- intervenable_interventions_type=VanillaIntervention
- ), 
- model = gpt2
-)
+pv_gpt2 = pv.IntervenableModel({
+ "layer": 0,
+ "component": "mlp_output",
+ "source_representation": torch.zeros(
+ gpt2.config.n_embd)
+}, model=gpt2)
 
-# intervene base with sources on the 4th token.
-original_outputs, intervened_outputs = intervenable_gpt2(
- tokenizer("The capital of Spain is", return_tensors="pt"),
- [tokenizer("The capital of Italy is", return_tensors="pt")],
- {"sources->base": 4}
+orig_outputs, intervened_outputs = pv_gpt2(
+ base = tokenizer(
+ "The capital of Spain is", 
+ return_tensors="pt"
+ ), 
+ unit_locations={"base": 3}
 )
-original_outputs.last_hidden_state - intervened_outputs.last_hidden_state
+print(intervened_outputs.last_hidden_state - orig_outputs.last_hidden_state)
 ```
 which returns,
-
 ```
 tensor([[[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
  [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
  [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
- [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
- [ 0.0008, -0.0078, -0.0066, ..., 0.0007, -0.0018, 0.0060]]])
+ [ 0.0483, -0.1212, -0.2816, ..., 0.1958, 0.0830, 0.0784],
+ [ 0.0519, 0.2547, -0.1631, ..., 0.0050, -0.0453, -0.1624]]])
 ```
-showing that we have causal effects only on the last token as expected. You can share your interventions through Huggingface with others with a single call,
+You can share your interventions through Huggingface with others with a single call,
 ```python
-intervenable_gpt2.save(
+pv_gpt2.save(
  save_directory="./your_gpt2_mounting_point/",
  save_to_hf_hub=True,
- hf_repo_name="your_gpt2_mounting_point",
+ hf_repo_name="your_gpt2_mounting_point"
 )
 ```
-We see interventions are knobs that can mount on models. And people can share their knobs with others to share knowledge about how to steer models. You can try this at [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Load_Save_and_Share_Interventions.ipynb) [**Intervention Sharing**] 
 
-You can also use the `intervenable_gpt2` just like a regular torch model component inside another model, or another pipeline as,
+You can also use the `pv_gpt2` just like a regular torch model component inside another model, or another pipeline as,
 ```py
 import torch
 import torch.nn as nn
@@ -81,7 +71,7 @@ from typing import List, Optional, Tuple, Union, Dict
 class ModelWithIntervenables(nn.Module):
  def __init__(self):
  super(ModelWithIntervenables, self).__init__()
- self.intervenable_gpt2 = intervenable_gpt2
+ self.pv_gpt2 = pv_gpt2
  self.relu = nn.ReLU()
  self.fc = nn.Linear(768, 1)
  # Your other downstream components go here
@@ -94,33 +84,90 @@ class ModelWithIntervenables(nn.Module):
  activations_sources: Optional[Dict] = None,
  subspaces: Optional[List] = None,
  ):
- _, counterfactual_x = self.intervenable_gpt2(
+ _, counterfactual_x = self.pv_gpt2(
  base,
  sources,
  unit_locations,
  activations_sources,
  subspaces
  )
- counterfactual_x = counterfactual_x.last_hidden_state
-
- counterfactual_x = self.relu(counterfactual_x)
- counterfactual_x = self.fc(counterfactual_x)
- return counterfactual_x
+ return self.fc(self.relu(counterfactual_x.last_hidden_state))
+```
+
+## Complex _Intervention Schema_ as an _Object_
+One key abstraction that **pyvene** provides is the encapsulation of the intervention schema. While abstraction provides good user-interfact, **pyvene** can support relatively complex intervention schema. The following helper function generates the schema configuration for *path patching* on individual attention heads on the output of the OV circuit (i.e., analyzing causal effect of each individual component):
+```py
+import pyvene as pv
+
+def path_patching_config(
+ layer, last_layer, 
+ component="head_attention_value_output", unit="h.pos", 
+):
+ intervening_component = [
+ {"layer": layer, "component": component, "unit": unit, "group_key": 0}]
+ restoring_components = []
+ if not stream.startswith("mlp_"):
+ restoring_components += [
+ {"layer": layer, "component": "mlp_output", "group_key": 1}]
+ for i in range(layer+1, last_layer):
+ restoring_components += [
+ {"layer": i, "component": "attention_output", "group_key": 1}
+ {"layer": i, "component": "mlp_output", "group_key": 1}
+ ]
+ intervenable_config = IntervenableConfig(intervening_component + restoring_components)
+ return intervenable_config
+```
+then you can wrap the config generated by this function to a model. And after you have done your intervention, you can share your path patching with others,
+```py
+_, tokenizer, gpt2 = pv.create_gpt2()
+
+pv_gpt2 = pv.IntervenableModel(
+ path_patching_config(4, gpt2.config.n_layer), 
+ model=gpt2
+)
+# saving the path
+pv_gpt2.save(
+ save_directory="./your_gpt2_path/"
+)
+# loading the path
+pv_gpt2 = pv.IntervenableModel.load(
+ "./tmp/",
+ model=gpt2)
 ```
 
 
 ## Selected Tutorials
 
 | **Level** | **Tutorial** | **Run in Colab** | **Description** |
 | --- | ------------- | ------------- | ------------- | 
-| Beginner | [**Getting Started**](tutorials/basic_tutorials/Basic_Intervention.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Basic_Intervention.ipynb) | Introduces basic static intervention on factual recall examples |
-| Beginner | [**Intervened Model Generation**](tutorials/advanced_tutorials/Intervened_Model_Generation.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/Intervened_Model_Generation.ipynb) | Shows how to intervene a model during generation |
-| Intermediate | [**Intervene Your Local Models**](tutorials/basic_tutorials/Add_New_Model_Type.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Add_New_Model_Type.ipynb) | Illustrates how to run this library with your own models |
+| Beginner | [**pyvene 101**](pyvene_101.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/pyvene/pyvene_101.ipynb) | Introduce you to the basics of pyvene |
 | Intermediate | [**ROME Causal Tracing**](tutorials/advanced_tutorials/Causal_Tracing.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/Causal_Tracing.ipynb) | Reproduce ROME's Results on Factual Associations with GPT2-XL |
 | Intermediate | [**Intervention v.s. Probing**](tutorials/advanced_tutorials/Probing_Gender.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/Probing_Gender.ipynb) | Illustrates how to run trainable interventions and probing with pythia-6.9B |
 | Advanced | [**Trainable Interventions for Causal Abstraction**](tutorials/advanced_tutorials/DAS_Main_Introduction.ipynb) | [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/pyvene/blob/main/tutorials/advanced_tutorials/DAS_Main_Introduction.ipynb) | Illustrates how to train an intervention to discover causal mechanisms of a neural model |
 
-## Causal Abstraction: From Interventions to Gain Interpretability Insights
+## Contributing to This Library
+Please see [our guidelines](CONTRIBUTING.md) about how to contribute to this repository. 
+
+*Pull requests, bug reports, and all other forms of contribution are welcomed and highly encouraged!* :octocat: 
+
+### Other Ways of Installation
+
+**Method 2: Install from the Repo**
+```bash
+pip install git+https://github.com/stanfordnlp/pyvene.git
+```
+
+**Method 3: Clone and Import**
+```bash
+git clone https://github.com/stanfordnlp/pyvene.git
+```
+and in parallel folder, import to your project as,
+```python
+from pyvene import pyvene
+_, tokenizer, gpt2 = pyvene.create_gpt2()
+```
+
+## A Little Guide for Causal Abstraction: From Interventions to Gain Interpretability Insights
 Basic interventions are fun but we cannot make any causal claim systematically. To gain actual interpretability insights, we want to measure the counterfactual behaviors of a model in a data-driven fashion. In other words, if the model responds systematically to your interventions, then you start to associate certain regions in the network with a high-level concept. We also call this alignment search process with model internals.
 
 ### Understanding Causal Mechanisms with Static Interventions
@@ -178,28 +225,6 @@ intervenable.train(
 ```
 where you need to pass in a trainable dataset, and your customized loss and metrics function. The trainable interventions can later be saved on to your disk. You can also use `intervenable.evaluate()` your interventions in terms of customized objectives.
 
-## Contributing to This Library
-Please see [our guidelines](CONTRIBUTING.md) about how to contribute to this repository. 
-
-*Pull requests, bug reports, and all other forms of contribution are welcomed and highly encouraged!* :octocat: 
-
-### Other Ways of Installation
-
-**Method 2: Install from the Repo**
-```bash
-pip install git+https://github.com/stanfordnlp/pyvene.git
-```
-
-**Method 3: Clone and Import**
-```bash
-git clone https://github.com/stanfordnlp/pyvene.git
-```
-and in parallel folder, import to your project as,
-```python
-from pyvene import pyvene
-_, tokenizer, gpt2 = pyvene.create_gpt2()
-```
-
 ## Related Works in Discovering Causal Mechanism of LLMs
 If you would like to read more works on this area, here is a list of papers that try to align or discover the causal mechanisms of LLMs. 
 - [Causal Abstractions of Neural Networks](https://arxiv.org/abs/2106.02997): This paper introduces interchange intervention (a.k.a. activation patching or causal scrubbing). It tries to align a causal model with the model's representations.

diff --git a/pyvene/__init__.py b/pyvene/__init__.py
@@ -2,7 +2,7 @@
 from .data_generators.causal_model import CausalModel
 from .models.intervenable_base import IntervenableModel
 from .models.configuration_intervenable_model import IntervenableConfig
-from .models.configuration_intervenable_model import IntervenableRepresentationConfig
+from .models.configuration_intervenable_model import RepresentationConfig
 
 
 # Interventions
@@ -24,6 +24,8 @@
 from .models.interventions import ZeroIntervention
 from .models.interventions import LocalistRepresentationIntervention
 from .models.interventions import DistributedRepresentationIntervention
+from .models.interventions import SourcelessIntervention
+from .models.interventions import NoiseIntervention
 
 
 # Utils

diff --git a/pyvene/models/basic_utils.py b/pyvene/models/basic_utils.py
@@ -130,6 +130,7 @@ def get_list_depth(lst):
  return 1 + max((get_list_depth(item) for item in lst), default=0)
  return 0
 
+
 def get_batch_size(model_input):
  """
  Get batch size based on the input
@@ -141,3 +142,28 @@ def get_batch_size(model_input):
  batch_size = v.shape[0]
  break
  return batch_size
+
+
+def GET_LOC(
+ LOC,
+ unit="h.pos",
+ batch_size=1,
+):
+ """
+ From simple locale to nested one.
+ """
+ if unit == "h.pos":
+ return [ 
+ [ 
+ [ 
+ [LOC[0]] 
+ ] * batch_size, 
+ [ 
+ [LOC[1]] 
+ ] * batch_size 
+ ] 
+ ]
+ else:
+ raise NotImplementedError(
+ f"{unit} is not supported."
+ )