Add a model_server example podman-llm

This is a tool that was written to be as simple as ollama, in it's simplest form it's: podman-llm run granite Signed-off-by: Eric Curtin <[email protected]>
containers · Jul 10, 2024 · 51baa8b · 51baa8b
1 parent 241e0e4
commit 51baa8b
Showing 1 changed file with 89 additions and 0 deletions.
diff --git a/model_servers/podman-llm/README.md b/model_servers/podman-llm/README.md
@@ -0,0 +1,89 @@
+# podman-llm
+
+The goal of podman-llm is to make AI even more boring.
+
+## Install
+
+Install podman-llm by running this one-liner:
+
+```
+curl -fsSL https://raw.githubusercontent.com/ericcurtin/podman-llm/s/install.sh | sudo bash
+```
+
+## Usage
+
+### Running Models
+
+You can run a model using the `run` command. This will start an interactive session where you can query the model.
+
+```
+$ podman-llm run granite
+> Tell me about podman in less than ten words
+A fast, secure, and private container engine for modern applications.
+>
+```
+
+### Serving Models
+
+To serve a model via HTTP, use the `serve` command. This will start an HTTP server that listens for incoming requests to interact with the model.
+
+```
+$ podman-llm serve granite
+...
+{"tid":"140477699799168","timestamp":1719579518,"level":"INFO","function":"main","line":3793,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","hostname":"127.0.0.1"}
+...
+```
+
+## Model library
+
+| Model | Parameters | Run |
+| ------------------ | ---------- | ------------------------------ |
+| granite | 3B | `podman-llm run granite` |
+| mistral | 7B | `podman-llm run mistral` |
+| merlinite | 7B | `podman-llm run merlinite` |
+
+## Containerfile Example
+
+Here is an example Containerfile:
+
+```
+FROM quay.io/podman-llm/podman-llm:41
+RUN llama-main --hf-repo ibm-granite/granite-3b-code-instruct-GGUF -m granite-3b-code-instruct.Q4_K_M.gguf
+LABEL MODEL=/granite-3b-code-instruct.Q4_K_M.gguf
+```
+
+`LABEL MODEL` is important so we know where to find the .gguf file.
+
+And we build via:
+
+```
+podman-llm build granite
+```
+
+## Diagram
+
+```
++---------------------+ +-----------------------+ +------------------+
+| | | Pull runtime layer | | Pull model layer |
+| podman-llm run | -> | for llama.cpp | -> | with granite |
+| | | (CPU, Vulkan, AMD, | | |
++---------------------+ | Nvidia, Intel, | |------------------|
+ | Apple Silicon, etc.) | | Repo options: |
+ +-----------------------+ +------------------+
+ | |
+ v v
+ +--------------+ +---------+
+ | Hugging Face | | quay.io |
+ +--------------+ +---------+
+ \ /
+ \ /
+ \ /
+ v v
+ +-----------------+
+ | Start container |
+ | with llama.cpp |
+ | and granite |
+ | model |
+ +-----------------+
+```
+