Tinygrad quantization support #213

varshith15 · 2024-09-10T18:00:39Z

AlexCheema · 2024-09-24T10:09:28Z

Great work @varshith15 would really like to see this through together with #200 - I want to see how viable it is for interoperability between different inference engines. This would open the door to a lot of really useful workflows like running tinygrad on Qualcomm chips (where it's by far the fastest) and MLX on Apple M chips (assuming it's faster, tinygrad might also be fastest here).

AlexCheema · 2024-09-30T22:53:46Z

Is this still in draft? @varshith15

varshith15 · 2024-10-01T08:26:46Z

@AlexCheema yeah, the tinygrad quantized_mat_mul i wrote is quite slow, debugging that, will need a couple of days

…d_quantize

Varshith added 2 commits September 10, 2024 23:29

llama quantize

b7b911d

api update

3329f26

quantized_linear

827f9e2

Varshith added 7 commits October 1, 2024 21:04

batched_quant_mat_mul

008031d

Merge branch 'main' of https://github.com/varshith15/exo into tinygra…

46ac7fa

…d_quantize

temp

c51898d

revert to dequantize

5380b5c

opt linear

3cddf14

quantization

43ec2f2

fix bias

51b696b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tinygrad quantization support #213

Tinygrad quantization support #213

varshith15 commented Sep 10, 2024 •

edited

Loading

AlexCheema commented Sep 24, 2024

AlexCheema commented Sep 30, 2024

varshith15 commented Oct 1, 2024

Tinygrad quantization support #213

Are you sure you want to change the base?

Tinygrad quantization support #213

Conversation

varshith15 commented Sep 10, 2024 • edited Loading

AlexCheema commented Sep 24, 2024

AlexCheema commented Sep 30, 2024

varshith15 commented Oct 1, 2024

varshith15 commented Sep 10, 2024 •

edited

Loading