-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can index use flash attention or xformers? #10
Comments
During our actual training process, we employed the flash attention2 mechanism; however, due to its heavy reliance on numerous other dependencies, the training codebase is not conducive to being open-sourced. For those seeking enhanced inference performance, we strongly advise adopting our newly released gguf quantization version or exploring alternative open-source options, including vllm and Tensor RT, among others. |
Well, I will try vllm.Thanks. |
Are you planning to implement HF Transformers' flash attention2 inference in the future? |
i want to know if it can use flash attention.thanks.
The text was updated successfully, but these errors were encountered: