Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int4 Support #1104

Open
fmac2000 opened this issue Feb 28, 2023 · 6 comments
Open

Int4 Support #1104

fmac2000 opened this issue Feb 28, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@fmac2000
Copy link

Hello Authors,

I apologise for asking questions unrelated to an issue with the repo however, would you consider support a newer paradigm I came across whilst reading a recent paper?

It looks incredibly promising and rather well written I must say, especially when considering the performance of such a precision.
Is there anyone on the team able to give this a shot?

@guillaumekln guillaumekln added the enhancement New feature or request label Mar 3, 2023
@guillaumekln
Copy link
Collaborator

Hello,

Thank you for sharing this paper!

At this time I don't plan on integrating INT4 which would require using CUTLASS to define custom kernels. We are currently using cuBLAS for matrix multiplication.

@jncraton
Copy link
Contributor

jncraton commented Jun 17, 2023

Would it be reasonable to implement this as a CPU-only optimization? GGML supports this on CPU, but I'm not sure if that approach makes sense here or not.

@Matthieu-Tinycoaching
Copy link

Hi,

Would be great to have the possibility to integrate int4 quantization regarding the very interesting results in terms of performance and inference!

@nickchomey
Copy link
Contributor

I see that the last few versions of opennmt have added support for 4bit and other quantization methods. https://forum.opennmt.net/t/opennmt-py-v3-3-released-following-3-2-with-plenty-of-new-features/5366

Might any of that be integrated into CTranslate2?

@bil-ash
Copy link

bil-ash commented Apr 8, 2024

@guillaumekln Yes, 4bit quantization (on cpu) is a very much required feature. Any plans of taking this up?

@bil-ash
Copy link

bil-ash commented Apr 8, 2024

Or maybe @ebraraktas can go one step further and implement 2bit and 3bit quantization using by taking clues from intel/neural-speed#178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants