Feature Request: GGUF format support #69

orkutmuratyilmaz · 2024-02-12T07:01:52Z

Hello and thanks for this beautiful repo,

Do you have plans to provide GGUF file? It would be great if we can have it.

Best,
Orkut

onurgu · 2024-02-27T16:24:57Z

Hi, thanks for the interest. We're working on it 👍🏼

helizac · 2024-06-13T08:56:39Z

Hello, you can reach out to GGUF support at helizac/TURNA_GGUF and see a usage example at TURNA_GGUF_USAGE.ipynb.

Currently, only CPU usage is supported, but CUDA support will be implemented if huggingface/candle supports it. For more information, see this related issue.

llama-cpp does not support quantized-t5 model at the moment but will be implemented in case of improvements

I recommend using Q8_1 or Q8K models for efficiency. At the moment, these models generate 5-6 tokens per second.

gokceuludogan · 2024-06-13T09:18:58Z

That's great news! Thank you for your contribution. We look forward to the implementation of CUDA support.

onurgu · 2024-06-14T06:08:31Z

Thank you @helizac! How did you do this? llama.cpp repo was not supporting T5 models, I see there are some developments yesterday

ggerganov/llama.cpp#5763

Did you do it yourself, if so where is the code?

helizac · 2024-06-14T07:56:16Z

Hello, unfortunately I did not make this development in llama.cpp repo issue mentioned, but I will try this branch and inform this issue. I implemented it in Rust language with the huggingface/candle framework as follows. I saw that CUDA support could be provided in some examples on the framework, but I encountered problems in the implementation part. I think CUDA support can be provided with a few changes via in:
https://github.com/huggingface/candle/blob/main/candle-examples%2Fexamples%2Fquantized-t5%2Fmain.rs

Related issue: huggingface/candle#2266

Currently, only CPU supported .gguf conversion process is below.

RUST_GGUF_CONVERT:
https://colab.research.google.com/drive/1s97zTs8hfT0wyGTDHvs8cVOm9mVgXd9G?usp=sharing

With the methods in this notebook, TURNA can be used in .gguf format.

helizac · 2024-06-14T20:00:51Z

So, I tried the edited new t5 branch -> https://github.com/fairydreaming/llama.cpp/tree/t5 but it's not suitable for the TURNA at the moment.

At the beginning, t5 branch expects a spiece.model file. But TURNA is using hf tokenizers. I converted the code for hf tokenizer implementation. But I faced a second problem. Due to tensor models are defined by
MODEL_TENSOR.DEC_FFN_UP: "decoder.block.{bid}.layer.2.DenseReluDense.wi
and
MODEL_TENSOR.ENC_FFN_UP: "encoder.block.{bid}.layer.1.DenseReluDense.wi"

in TENSOR_MODELS it didn't worked. Because TURNA expects:

INFO:hf-to-gguf:dec.blk.0.ffn_up.weight, torch.float32 --> F32, shape = {1024, 2816}
INFO:hf-to-gguf:dec.blk.0.dense_relu_dense.wi_1.weight, torch.float32 --> F32, shape = {1024, 2816}
INFO:hf-to-gguf:enc.blk.0.ffn_up.weight, torch.float32 --> F32, shape = {1024, 2816}
INFO:hf-to-gguf:enc.blk.0.dense_relu_dense.wi_1.weight, torch.float32 --> F32, shape = {1024, 2816}

I defined the TENSORS on my own and I could export a .gguf output but llama won't work with it due to math calculations -> "error loading model vocabulary: Index out of array bounds in XCDA array!". For this purpose, it is necessary to examine it in detail and rewrite the functions in llama.cpp file.

For now, the previous huggingface/candle Rust framework implementation will be more comfortable to use. If GPU support comes soon, the model can be used this way easily:
https://colab.research.google.com/drive/1s97zTs8hfT0wyGTDHvs8cVOm9mVgXd9G?usp=sharing (RUST_GGUF_CONVERT)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: GGUF format support #69

Feature Request: GGUF format support #69

orkutmuratyilmaz commented Feb 12, 2024

onurgu commented Feb 27, 2024

helizac commented Jun 13, 2024 •

edited

Loading

gokceuludogan commented Jun 13, 2024

onurgu commented Jun 14, 2024

helizac commented Jun 14, 2024 •

edited

Loading

helizac commented Jun 14, 2024 •

edited

Loading

Feature Request: GGUF format support #69

Feature Request: GGUF format support #69

Comments

orkutmuratyilmaz commented Feb 12, 2024

onurgu commented Feb 27, 2024

helizac commented Jun 13, 2024 • edited Loading

gokceuludogan commented Jun 13, 2024

onurgu commented Jun 14, 2024

helizac commented Jun 14, 2024 • edited Loading

helizac commented Jun 14, 2024 • edited Loading

helizac commented Jun 13, 2024 •

edited

Loading

helizac commented Jun 14, 2024 •

edited

Loading

helizac commented Jun 14, 2024 •

edited

Loading