[WIP] Add a model_server example podman-llm #649

ericcurtin · 2024-06-29T00:07:37Z

This is a tool that was written to be as simple as ollama, in it's simplest form it's:

podman-llm run granite

rhatdan · 2024-06-29T10:17:48Z

Does this tool have an upstream? Where is the REPO?

Not sure I love the name.

ericcurtin · 2024-06-29T11:41:25Z

Yeah, happy to rename, just needed to name it something:

https://github.com/ericcurtin/podman-llm

Could be llmc, llm-container, llm-oci, podllm? I really don't mind

It requires a couple of small patches to llama.cpp also, but nothing major:

https://github.com/ericcurtin/llama.cpp/tree/podman-llm

rhatdan · 2024-07-01T09:56:36Z

@sallyom @Gregory-Pereira @MichaelClifford @cooktheryan PTAL

ericcurtin · 2024-07-01T10:26:53Z

I was working with Ollama, but I worry about the long-term future there as regards external contributions:

https://github.com/ollama/ollama/pulls/ericcurtin

I fixed a lot of issues around OSTree-based OSes, podman support, Fedora support in general... But I just don't think Ollama folk are genuinely interested in external contributions (they weren't complex reviews).

So then I removed the middle component Ollama itself, since Ollama is a llama.cpp wrapper. So this uses llama.cpp direct pretty much, it kinda shows that the Ollama layer actually isn't doing a whole pile.

What I really liked about Ollama is it simplified running LLMs to:

ollama run mistral

so that's what I was going for here. I think creating an Ollama clone that's built directly against llama.cpp library could do very well.

And this is daemon-less (unlike Ollama), no client, servers, etc. unless you want to serve, it's zippier as a result.

ericcurtin · 2024-07-01T10:28:38Z

This review in particular was super easy and would give rpm-ostree/bootc OS support:

https://github.com/ollama/ollama/pull/3615/files

ericcurtin · 2024-07-01T10:31:14Z

There's obvious overlap with instructlab...

This is like containerized, daemonless, simplified instructlab for dummies, kinda like Ollama

ericcurtin · 2024-07-01T15:29:54Z

If we felt this idea was worth pursing there would probably be plenty of breaking changes to go. Some ideas we were thinking about GGUF's would be delivered as single file "FROM scratch" images (in it's own gguf container store, to be used with podman-llm:41, podman-llm-amd:41 or podman-llm-nvidia:41 container images).

So every "podman-llm run/serve" invocation is made up of some container image runtime (AMD, Nvidia, CPU, etc.) and a .gguf file which is delivered as a separate container image or downloaded from hugging face.

It's like Ollama with no custom "Modelfile" syntax (I think standard containerfiles are better) and no special OCI format, a .gguf is just a .gguf from a container image (or hugging face direct)

Some name change like @rhatdan is proposing to whatever people thinks sounds cool :)

But this is just a 20% project for me so would like to get people's opinions on if something like this is worthwhile, etc.

Gregory-Pereira · 2024-07-07T15:17:12Z

Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around bootc support and in my eyes this can be considered as much or more of a repo showcasing bootc, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in the model_servers dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.

ericcurtin · 2024-07-08T10:56:37Z

Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around bootc support and in my eyes this can be considered as much or more of a repo showcasing bootc, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in the model_servers dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.

bootc is pretty useful for AI use-cases, even for having the Nvidia dependencies pre-installed alone which are not always trivial to install in a deployment.

podman-llm (to be renamed) would work within a bootc image, or a non-bootc image for that matter. The only real dependency it has is that podman (or docker) is installed.

Gregory-Pereira

lets go with this for now.

ericcurtin · 2024-07-08T17:39:50Z

@tumido 's feedback could also be interesting, looking at upcoming devconf.us talks, he is speaking about:

"Store AI/ML models efficiently with OCI Artifacts"

which is one of the things I am trying to do here, maybe we can combine efforts :)

I played around with a couple of ideas with different pros/cons, podman volumes, FROM scratch images, just simple container image inheritance. Right now it's a bind mounted directory ($HOME/.cache/huggingface/) to share .gguf's between multiple images. @tumido I bet has some interesting ideas :)

ericcurtin · 2024-07-10T16:53:00Z

Updated README.md diagram to highlight the value of pulling different runtimes:

+--------------------+
| Pull runtime layer |
| for llama.cpp      |
| (CPU, Vulkan, AMD, |
|  Nvidia, Intel or  |
|  Apple Silicon)    |
+--------------------+

This is a tool that was written to be as simple as ollama, in it's simplest form it's: podman-llm run granite Signed-off-by: Eric Curtin <[email protected]>

ericcurtin requested review from MichaelClifford, rhatdan, sallyom, lmilbaum, cgwalters and Gregory-Pereira as code owners June 29, 2024 00:07

ericcurtin force-pushed the podman-llm branch from 0356b81 to 75acd17 Compare June 29, 2024 00:09

ericcurtin force-pushed the podman-llm branch from 75acd17 to 52355fd Compare June 29, 2024 11:45

ericcurtin force-pushed the podman-llm branch 2 times, most recently from e6ed9c5 to a39f1ee Compare July 7, 2024 00:18

Gregory-Pereira approved these changes Jul 8, 2024

View reviewed changes

ericcurtin force-pushed the podman-llm branch 3 times, most recently from 997c8e0 to 432e91b Compare July 10, 2024 16:50

ericcurtin force-pushed the podman-llm branch from 432e91b to 27d52d9 Compare July 10, 2024 16:59

Add a model_server example podman-llm

51baa8b

This is a tool that was written to be as simple as ollama, in it's simplest form it's: podman-llm run granite Signed-off-by: Eric Curtin <[email protected]>

ericcurtin force-pushed the podman-llm branch from 27d52d9 to 51baa8b Compare July 10, 2024 17:00

rhatdan changed the title ~~Add a model_server example podman-llm~~ [WIP] Add a model_server example podman-llm Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add a model_server example podman-llm #649

[WIP] Add a model_server example podman-llm #649

ericcurtin commented Jun 29, 2024

rhatdan commented Jun 29, 2024

ericcurtin commented Jun 29, 2024 •

edited

Loading

rhatdan commented Jul 1, 2024

ericcurtin commented Jul 1, 2024 •

edited

Loading

ericcurtin commented Jul 1, 2024

ericcurtin commented Jul 1, 2024

ericcurtin commented Jul 1, 2024 •

edited

Loading

Gregory-Pereira commented Jul 7, 2024

ericcurtin commented Jul 8, 2024 •

edited

Loading

Gregory-Pereira left a comment

ericcurtin commented Jul 8, 2024 •

edited

Loading

ericcurtin commented Jul 10, 2024

[WIP] Add a model_server example podman-llm #649

Are you sure you want to change the base?

[WIP] Add a model_server example podman-llm #649

Conversation

ericcurtin commented Jun 29, 2024

rhatdan commented Jun 29, 2024

ericcurtin commented Jun 29, 2024 • edited Loading

rhatdan commented Jul 1, 2024

ericcurtin commented Jul 1, 2024 • edited Loading

ericcurtin commented Jul 1, 2024

ericcurtin commented Jul 1, 2024

ericcurtin commented Jul 1, 2024 • edited Loading

Gregory-Pereira commented Jul 7, 2024

ericcurtin commented Jul 8, 2024 • edited Loading

Gregory-Pereira left a comment

Choose a reason for hiding this comment

ericcurtin commented Jul 8, 2024 • edited Loading

ericcurtin commented Jul 10, 2024

ericcurtin commented Jun 29, 2024 •

edited

Loading

ericcurtin commented Jul 1, 2024 •

edited

Loading

ericcurtin commented Jul 1, 2024 •

edited

Loading

ericcurtin commented Jul 8, 2024 •

edited

Loading

ericcurtin commented Jul 8, 2024 •

edited

Loading