This repository houses research projects in mechanistic interpretability: reverse engineering neural networks.
The goal of this notebook is to explore the phenomenon of bracket closing in the GPT-Neo 125M model, whereby it can correctly match open parentheses ([{<
with their corresponding closing versions )]}>
.
This is Problem 2.13 in Neel Nanda's 200 Concrete Open Problems in Mechanistic Interpretability. The first goal is to figure out how the model determines whether an opening or closing bracket is more appropriate, and the second is to figure out how it knows the correct kind: (
, [
, {
or <
.
Some notebooks messing around with models.