Skip to content

akozlo/InterpTools

Repository files navigation

InterpTools

InterpTools is intended to be a growing and improving toolkit of resources for doing interpretability work with LLM-style models. As we identify more tools that we need, we can continue to add them to this toolkit to make extracting, analyzing, and modifying model internals as simple, instructive, and fun as possible.

At this point, I am relying largely upon the transformer_lens python library. Documentation here:

Here are a few tutorial colab notebooks that should also help you get started with TransformerLens and model interp more generally:

About

Tools for AI Interpretability

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published