Skip to content

bentoml/openllm-models

Repository files navigation

The default model repository of openllm

This repo (on main branch) is already included by openllm by default.

If you want more up-to-date untested models, please add our nightly branch.

openllm repo add nightly https://github.com/bentoml/openllm-models@nightly

Supported Models

$ openllm repo update
$ openllm model list
model              version                                          repo     required GPU RAM    platforms
-----------------  -----------------------------------------------  -------  ------------------  -----------
codestral          codestral:22b-v0.1-fp16-e452                     default  80G                 linux
gemma              gemma:2b-instruct-fp16-5eac                      default  12G                 linux
                   gemma:7b-instruct-fp16-c20f                      default  24G                 linux
                   gemma:7b-instruct-awq-4bit-80de                  default  12G                 linux
gemma2             gemma2:9b-instruct-fp16-aadd                     default  24G                 linux
                   gemma2:27b-instruct-fp16-376d                    default  80G                 linux
jamba1.5           jamba1.5:mini-fp16-dddc                          default  80Gx4               linux
llama2             llama2:7b-chat-fp16-3a8c                         default  16G                 linux
                   llama2:7b-chat-awq-4bit-0c30                     default  12G                 linux
                   llama2:13b-chat-fp16-e833                        default  40G                 linux
                   llama2:70b-chat-fp16-30af                        default  80Gx2               linux
llama3             llama3:8b-instruct-fp16-7dd1                     default  24G                 linux
                   llama3:8b-instruct-awq-4bit-0947                 default  12G                 linux
                   llama3:70b-instruct-fp16-2370                    default  80Gx2               linux
                   llama3:70b-instruct-awq-4bit-b148                default  80G                 linux
llama3.1           llama3.1:8b-instruct-fp16-210e                   default  24G                 linux
                   llama3.1:8b-instruct-awq-4bit-508e               default  12G                 linux
                   llama3.1:70b-instruct-fp16-5526                  default  80Gx2               linux
                   llama3.1:70b-instruct-awq-4bit-3330              default  80G                 linux
                   llama3.1:405b-instruct-awq-4bit-3f82             default  80Gx4               linux
llama3.1-nemotron  llama3.1-nemotron:70b-instruct-fp16-51b4         default  80Gx2               linux
llama3.2           llama3.2:1b-instruct-fp16-69ad                   default  12G                 linux
                   llama3.2:1b-instruct-ggml-fp16-linux-08c5        default                      linux
                   llama3.2:1b-instruct-ggml-fp16-darwin-12f1       default                      macos
                   llama3.2:3b-instruct-fp16-785f                   default  12G                 linux
                   llama3.2:11b-vision-instruct-1fbd                default  80G                 linux
mistral            mistral:7b-instruct-fp16-28ad                    default  24G                 linux
                   mistral:7b-instruct-awq-4bit-f24d                default  12G                 linux
                   mistral:24b-instruct-nemo-e080                   default  80G                 linux
mistral-large      mistral-large:123b-instruct-fp16-c0fa            default  80Gx4               linux
                   mistral-large:123b-instruct-awq-4bit-a39f        default  80G                 linux
mixtral            mixtral:8x7b-instruct-v0.1-fp16-e7ea             default  80Gx2               linux
                   mixtral:8x7b-instruct-v0.1-awq-4bit-29b9         default  40G                 linux
phi3               phi3:3.8b-instruct-fp16-bdd2                     default  12G                 linux
                   phi3:3.8b-instruct-ggml-q4-ccda                  default                      macos
pixtral            pixtral:12b-240910-8551                          default  80G                 linux
qwen2              qwen2:0.5b-instruct-fp16-7c8c                    default  12G                 linux
                   qwen2:1.5b-instruct-fp16-32de                    default  12G                 linux
                   qwen2:7b-instruct-fp16-c1fc                      default  24G                 linux
                   qwen2:7b-instruct-awq-4bit-78b4                  default  12G                 linux
                   qwen2:57b-a14b-instruct-fp16-b977                default  80Gx2               linux
                   qwen2:72b-instruct-fp16-d6dc                     default  80Gx2               linux
                   qwen2:72b-instruct-awq-4bit-7a98                 default  80G                 linux
qwen2.5            qwen2.5:0.5b-instruct-fp16-3fbc                  default  12G                 linux
                   qwen2.5:1.5b-instruct-fp16-fba6                  default  12G                 linux
                   qwen2.5:3b-instruct-fp16-453b                    default  12G                 linux
                   qwen2.5:7b-instruct-fp16-b253                    default  24G                 linux
                   qwen2.5:14b-instruct-fp16-7f0b                   default  80G                 linux
                   qwen2.5:14b-instruct-ggml-q4-darwin-5f24         default                      macos
                   qwen2.5:14b-instruct-ggml-q8-darwin-8b4f         default                      macos
                   qwen2.5:32b-instruct-fp16-61fc                   default  80G                 linux
                   qwen2.5:32b-instruct-awq-4bit-f482               default  40G                 linux
                   qwen2.5:32b-instruct-ggml-fp16-darwin-028b       default                      macos
                   qwen2.5:72b-instruct-fp16-87a8                   default  80Gx2               linux
                   qwen2.5:72b-instruct-ggml-q4-darwin-7e2c         default                      macos
qwen2.5-coder      qwen2.5-coder:7b-instruct-9c7d                   default  24G                 linux
                   qwen2.5-coder:7b-instruct-ggml-fp16-linux-6e86   default                      linux
                   qwen2.5-coder:7b-instruct-ggml-fp16-darwin-dc3a  default                      macos
                   qwen2.5-coder:32b-instruct-109c                  default  80G                 linux
qwen2vl            qwen2vl:7b-instruct-fp16-ceb2                    default  24G                 linux

Development Guide

Open PRs to the nightly branch to add new models or update existing models.

You can also fork this repo and add your own models.

Use openllm repo add to use your own model repository.