can index use flash attention or xformers? #10

Koishi-Star · 2024-06-16T15:52:50Z

i want to know if it can use flash attention.thanks.

mayokaze · 2024-06-19T04:11:55Z

During our actual training process, we employed the flash attention2 mechanism; however, due to its heavy reliance on numerous other dependencies, the training codebase is not conducive to being open-sourced. For those seeking enhanced inference performance, we strongly advise adopting our newly released gguf quantization version or exploring alternative open-source options, including vllm and Tensor RT, among others.

Koishi-Star · 2024-06-20T16:23:07Z

Well, I will try vllm.Thanks.

jyweky · 2024-07-12T09:15:08Z

Are you planning to implement HF Transformers' flash attention2 inference in the future?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can index use flash attention or xformers? #10

can index use flash attention or xformers? #10

Koishi-Star commented Jun 16, 2024

mayokaze commented Jun 19, 2024

Koishi-Star commented Jun 20, 2024

jyweky commented Jul 12, 2024

can index use flash attention or xformers? #10

can index use flash attention or xformers? #10

Comments

Koishi-Star commented Jun 16, 2024

mayokaze commented Jun 19, 2024

Koishi-Star commented Jun 20, 2024

jyweky commented Jul 12, 2024