New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

add eval for attention sink #7070

Merged

facebook-github-bot merged 7 commits into gh/helunwencser/80/base from gh/helunwencser/80/head

Dec 2, 2024

Commits on Nov 25, 2024

add eval for attention sink

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled.

This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled.

Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/)

[ghstack-poisoned]

helunwencser committed Nov 25, 2024

Commits on Nov 26, 2024

Update on "add eval for attention sink"

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled.

This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled.

Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/)

Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled:

<img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515">


[ghstack-poisoned]

helunwencser committed Nov 26, 2024

Commits on Nov 27, 2024

Update on "add eval for attention sink"

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled.

This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled.

Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/)

Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled:

<img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515">


[ghstack-poisoned]

helunwencser committed Nov 27, 2024

Commits on Dec 2, 2024

Update on "add eval for attention sink"

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled.

This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled.

Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/)

Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled:

<img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515">


[ghstack-poisoned]

helunwencser committed Dec 2, 2024

Update on "add eval for attention sink"

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled.

This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled.

Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/)

Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled:

<img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515">


[ghstack-poisoned]

helunwencser committed Dec 2, 2024

Update on "add eval for attention sink"

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled.

This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled.

Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/)

Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled:

<img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515">


[ghstack-poisoned]

helunwencser committed Dec 2, 2024

Update on "add eval for attention sink"

This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled.

This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled.

Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/)

Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled:

<img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515">


[ghstack-poisoned]

helunwencser committed Dec 2, 2024