Releases: TransformerLensOrg/TransformerLens
v2.11.0
LLaMA 3.3 support! This release also includes a handful of usability improvements.
What's Changed
- Set prepend_bos to false by default for Qwen models by @degenfabian in #815
- Throw error when using attn_in with grouped query attention by @degenfabian in #810
- Feature llama 33 by @bryce13950 in #826
Full Changelog: v2.10.0...v2.11.0
v2.10.0
Huge update! This is likely going to be the last big 2.x update. This update greatly improves model implementation accuracy, and adds some of the newer Qwen models.
What's Changed
- Remove einsum in forward pass in AbstractAttention by @degenfabian in #783
- Colab compatibility bug fixes by @degenfabian in #794
- Remove einsum usage from create_alibi_bias function by @degenfabian in #781
- Actions token access by @bryce13950 in #797
- Remove einsum in apply_causal_mask in abstract_attention.py by @degenfabian in #782
- clarified arguments a bit for hook_points by @bryce13950 in #799
- Remove einsum in logit_attrs in ActivationCache by @degenfabian in #788
- Remove einsum in compute_head_results in ActivationCache by @degenfabian in #789
- Remove einsum usage in refactor_factored_attn_matrices in HookedTransformer by @degenfabian in #791
- Remove einsum usage in _get_w_in_matrix in SVDInterpreter by @degenfabian in #792
- Remove einsum usage in forward function of BertMLMHead by @degenfabian in #793
- Set default_prepend_bos to False in Bloom model configuration by @degenfabian in #806
- Remove einsum in complex_attn_linear by @degenfabian in #790
- Add a demo of collecting activations from a single location in the model. by @adamkarvonen in #807
- Add support for Qwen_with_Questions by @degenfabian in #811
- Added support for Qwen2.5 by @israel-adewuyi in #809
- Updated devcontainers to use python3.11 by @jonasrohw in #812
New Contributors
- @israel-adewuyi made their first contribution in #809
- @jonasrohw made their first contribution in #812
Full Changelog: v2.9.1...v2.10.0
v2.9.1
Minor dependency change to address a change in an outside dependency
What's Changed
- added typeguard dependency by @bryce13950 in #786
Full Changelog: v2.9.0...v2.9.1
v2.9.0
Lot's of accuracy improvements! A number of models are behaving closer to how they behave in Transformers, and a new internal configuration has been added to allow for more ease of use!
What's Changed
- fix the bug that attention_mask and past_kv_cache cannot work together by @yzhhr in #772
- Set prepend_bos to false by default for Bloom model family by @degenfabian in #775
- Fix that if use_past_kv_cache is set to True models from the Bloom family produce weird outputs. by @degenfabian in #777
New Contributors
- @yzhhr made their first contribution in #772
- @degenfabian made their first contribution in #775
Full Changelog: v2.8.1...v2.9.0
v2.8.1
New notebook for comparing models, and bug fix with dealing with newer LLaMA models!
What's Changed
- Logit comparator tool by @curt-tigges in #765
- Add support for NTK-by-Part Rotary Embedding & set correct rotary base for Llama-3.1 series by @Hzfinfdu in #764
New Contributors
Full Changelog: v2.8.0...v2.8.1
v2.8.0
What's Changed
- add transformer diagram by @akozlo in #749
- Demo colab compatibility by @bryce13950 in #752
- Add support for
Mistral-Nemo-Base-2407
model by @ryanhoangt in #751 - Fix the bug that tokenize_and_concatenate function not working for small dataset by @xy-z-code in #725
- added new block for recent diagram, and colab compatibility notebook by @bryce13950 in #758
- Add warning and halt execution for incorrect T5 model usage by @vatsalrathod16 in #757
- New issue template for reporting model compatibility by @bryce13950 in #759
- Add configurations for Llama 3.1 models(Llama-3.1-8B and Llama-3.1-70B) by @vatsalrathod16 in #761
New Contributors
- @akozlo made their first contribution in #749
- @ryanhoangt made their first contribution in #751
- @xy-z-code made their first contribution in #725
- @vatsalrathod16 made their first contribution in #757
Full Changelog: v2.7.1...v2.8.0
v2.7.1
What's Changed
- Updated broken Slack link by @neelnanda-io in #742
from_pretrained
has correct return type (i.e.HookedSAETransformer.from_pretrained
returnsHookedSAETransformer
) by @callummcdougall in #743- Avoid warning in
utils.download_file_from_hf
by @albertsgarde in #739
New Contributors
- @albertsgarde made their first contribution in #739
Full Changelog: v2.7.0...v2.7.1
v2.7.0
Model 3.2 support! There is also a new compatibility added to the function test_promt
to allow for multiple prompts, as well as a minor typo.
What's Changed
- Typo hooked encoder by @bryce13950 in #732
utils.test_prompt
compares multiple prompts by @callummcdougall in #733- Model llama 3.2 by @bryce13950 in #734
Full Changelog: v2.6.0...v2.7.0
v2.6.0
Another nice little feature update! You now have the ability to ungroup the grouped query attention head component through a new config parameter ungroup_grouped_query_attention
!
What's Changed
- Ungrouping GQA by @hannamw & @FlyingPumba in #713
Full Changelog: v2.5.0...v2.6.0
v2.5.0
Nice little release! This release adds a new parameter named first_n_layers
that will allow you to specify how many layers of a model you want to load.
What's Changed
- Fix typo in bug issue template by @JasonGross in #715
- HookedTransformerConfig docs string:
weight_init_mode
=>init_mode
by @JasonGross in #716 - Allow loading only first n layers. by @joelburget in #717
Full Changelog: v2.4.1...v2.5.0