Mechanistic Interpretability Projects

This repository houses research projects in mechanistic interpretability: reverse engineering neural networks.

Understanding bracket closing in GPT-Neo

The goal of this notebook is to explore the phenomenon of bracket closing in the GPT-Neo 125M model, whereby it can correctly match open parentheses ([{< with their corresponding closing versions )]}>.

This is Problem 2.13 in Neel Nanda's 200 Concrete Open Problems in Mechanistic Interpretability. The first goal is to figure out how the model determines whether an opening or closing bracket is more appropriate, and the second is to figure out how it knows the correct kind: (, [, { or <.

Playground

Some notebooks messing around with models.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.vscode		.vscode
mechanistic_interpretability		mechanistic_interpretability
playground		playground
.gitignore		.gitignore
README.md		README.md
bracket-closing.ipynb		bracket-closing.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mechanistic Interpretability Projects

Understanding bracket closing in GPT-Neo

Playground

About

Releases

Packages

Languages

SamAdamDay/mechanistic-interpretability-projects

Folders and files

Latest commit

History

Repository files navigation

Mechanistic Interpretability Projects

Understanding bracket closing in GPT-Neo

Playground

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages