From 493607e1e5e91b426f6d2782cec30a259d23e425 Mon Sep 17 00:00:00 2001 From: Lunwen He Date: Mon, 2 Dec 2024 11:12:17 -0800 Subject: [PATCH] Update base for Update on "add eval for attention sink" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled. This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled. Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/) Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled: Screenshot 2024-11-25 at 2 46 04 PM [ghstack-poisoned]