Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add a model_server example podman-llm #649

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ericcurtin
Copy link
Contributor

This is a tool that was written to be as simple as ollama, in it's simplest form it's:

podman-llm run granite

@rhatdan
Copy link
Member

rhatdan commented Jun 29, 2024

Does this tool have an upstream? Where is the REPO?

Not sure I love the name.

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Jun 29, 2024

Yeah, happy to rename, just needed to name it something:

https://github.com/ericcurtin/podman-llm

Could be llmc, llm-container, llm-oci, podllm? I really don't mind

It requires a couple of small patches to llama.cpp also, but nothing major:

https://github.com/ericcurtin/llama.cpp/tree/podman-llm

@rhatdan
Copy link
Member

rhatdan commented Jul 1, 2024

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Jul 1, 2024

I was working with Ollama, but I worry about the long-term future there as regards external contributions:

https://github.com/ollama/ollama/pulls/ericcurtin

I fixed a lot of issues around OSTree-based OSes, podman support, Fedora support in general... But I just don't think Ollama folk are genuinely interested in external contributions (they weren't complex reviews).

So then I removed the middle component Ollama itself, since Ollama is a llama.cpp wrapper. So this uses llama.cpp direct pretty much, it kinda shows that the Ollama layer actually isn't doing a whole pile.

What I really liked about Ollama is it simplified running LLMs to:

ollama run mistral

so that's what I was going for here. I think creating an Ollama clone that's built directly against llama.cpp library could do very well.

And this is daemon-less (unlike Ollama), no client, servers, etc. unless you want to serve, it's zippier as a result.

@ericcurtin
Copy link
Contributor Author

This review in particular was super easy and would give rpm-ostree/bootc OS support:

https://github.com/ollama/ollama/pull/3615/files

@ericcurtin
Copy link
Contributor Author

There's obvious overlap with instructlab...

This is like containerized, daemonless, simplified instructlab for dummies, kinda like Ollama

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Jul 1, 2024

If we felt this idea was worth pursing there would probably be plenty of breaking changes to go. Some ideas we were thinking about GGUF's would be delivered as single file "FROM scratch" images (in it's own gguf container store, to be used with podman-llm:41, podman-llm-amd:41 or podman-llm-nvidia:41 container images).

So every "podman-llm run/serve" invocation is made up of some container image runtime (AMD, Nvidia, CPU, etc.) and a .gguf file which is delivered as a separate container image or downloaded from hugging face.

It's like Ollama with no custom "Modelfile" syntax (I think standard containerfiles are better) and no special OCI format, a .gguf is just a .gguf from a container image (or hugging face direct)

Some name change like @rhatdan is proposing to whatever people thinks sounds cool :)

But this is just a 20% project for me so would like to get people's opinions on if something like this is worthwhile, etc.

@ericcurtin ericcurtin force-pushed the podman-llm branch 2 times, most recently from e6ed9c5 to a39f1ee Compare July 7, 2024 00:18
@Gregory-Pereira
Copy link
Collaborator

Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around bootc support and in my eyes this can be considered as much or more of a repo showcasing bootc, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in the model_servers dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Jul 8, 2024

Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around bootc support and in my eyes this can be considered as much or more of a repo showcasing bootc, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in the model_servers dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.

bootc is pretty useful for AI use-cases, even for having the Nvidia dependencies pre-installed alone which are not always trivial to install in a deployment.

podman-llm (to be renamed) would work within a bootc image, or a non-bootc image for that matter. The only real dependency it has is that podman (or docker) is installed.

Copy link
Collaborator

@Gregory-Pereira Gregory-Pereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets go with this for now.

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Jul 8, 2024

@tumido 's feedback could also be interesting, looking at upcoming devconf.us talks, he is speaking about:

"Store AI/ML models efficiently with OCI Artifacts"

which is one of the things I am trying to do here, maybe we can combine efforts :)

I played around with a couple of ideas with different pros/cons, podman volumes, FROM scratch images, just simple container image inheritance. Right now it's a bind mounted directory ($HOME/.cache/huggingface/) to share .gguf's between multiple images. @tumido I bet has some interesting ideas :)

@ericcurtin ericcurtin force-pushed the podman-llm branch 3 times, most recently from 997c8e0 to 432e91b Compare July 10, 2024 16:50
@ericcurtin
Copy link
Contributor Author

Updated README.md diagram to highlight the value of pulling different runtimes:

+--------------------+
| Pull runtime layer |
| for llama.cpp      |
| (CPU, Vulkan, AMD, |
|  Nvidia, Intel or  |
|  Apple Silicon)    |
+--------------------+

This is a tool that was written to be as simple as ollama, in it's
simplest form it's:

podman-llm run granite

Signed-off-by: Eric Curtin <[email protected]>
@rhatdan rhatdan changed the title Add a model_server example podman-llm [WIP] Add a model_server example podman-llm Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants