Skip to content

SamAdamDay/mechanistic-interpretability-projects

Repository files navigation

Mechanistic Interpretability Projects

This repository houses research projects in mechanistic interpretability: reverse engineering neural networks.

Open In Colab

The goal of this notebook is to explore the phenomenon of bracket closing in the GPT-Neo 125M model, whereby it can correctly match open parentheses ([{< with their corresponding closing versions )]}>.

This is Problem 2.13 in Neel Nanda's 200 Concrete Open Problems in Mechanistic Interpretability. The first goal is to figure out how the model determines whether an opening or closing bracket is more appropriate, and the second is to figure out how it knows the correct kind: (, [, { or <.

Some notebooks messing around with models.

About

Projects doing mechanistic interpretability research

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published