Update on "add eval for attention sink"

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled. This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled. Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/) Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled: <img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515"> [ghstack-poisoned]
pytorch · Dec 2, 2024 · 2f4641f · 2f4641f
2 parents 38d9e1c + 493607e
commit 2f4641f
Showing 1 changed file with 1 addition and 3 deletions.
diff --git a/examples/models/llama/eval_llama_lib.py b/examples/models/llama/eval_llama_lib.py
@@ -318,9 +318,7 @@ def eval_llama(
         print(f"{task}: {res}")
 
 
-def eval_llama_with_attention_sink(
-    model_name: str, args: argparse.ArgumentParser
-):
+def eval_llama_with_attention_sink(model_name: str, args: argparse.ArgumentParser):
     """
     Evaluate the model's perplexity when AttentionSink is enabled.