-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ollama support #91
Comments
/kind feature |
RE: |
/assign @qinguoyi |
i will finish this work util 11.2 |
Hey @qinguoyi if you have any design details, better to share in this issue, we can discuss about that to avoid unnecessary refactorings. Thanks! |
Let's see some backgorund,
so, Considering the llmaz, our goal is to support ollama importing custom model file inference,including guff and safetensors(import direct) Let's see the difficult to impl,https://github.com/ollama/ollama/blob/main/docs/import.md , according to the official docs, if we import custom model file, we need exec some shelll cmd after start ollama server:
Let we see ollama offlical image cmd, according the image inspect, we can see there is olny to start ollama server. Let's see how to do itIf you want to execute multiple commands, try to use We have two containers, an init container for downloading models and a main container for starting the inference service; so, we have two solutions to implement it.
In summary, we choose the second method to implement it. The specific script commands are as follows: #!/bin/bash
# start ollama server
ollama serve &
# ensure server is normal
sleep 5
# check input params
if [ -z "$1" ]; then
echo "please input GGUF model file path,such as:./start_ollama.sh /path/to/model.gguf"
exit 1
fi
MODEL_PATH=$1
# judge input is file path or dir path
if [ -f "$MODEL_PATH" ]; then
echo "input file path:$MODEL_PATH"
# judge whether the path is suffix with .gguf or not
if [ -f "$MODEL_PATH" ]; then
if [[ "$MODEL_PATH" == *.gguf ]]; then
echo "file exist and suffix with .guff :$MODEL_PATH"
else
echo "file exist but not suffix with .guff:$MODEL_PATH"
eixt 1
fi
else
echo "file is not exist:$MODEL_PATH"
exit 1
fi
elif [ -d "$MODEL_PATH" ]; then
echo "input dir path:$MODEL_PATH"
if [ -d "$MODEL_PATH" ]; then
# judge whether has suffix with .safetensors in the dir or not
SAFETENSORS_FILES=$(find "$MODEL_PATH" -type f -name "*.safetensors")
if [ -z "$SAFETENSORS_FILES" ]; then
echo "dir exists but there no file suffix with .safetensors"
exit 1
else
echo "dir exists and there has suffix with .safetensors in the dir:"
echo "$SAFETENSORS_FILES"
fi
else
echo "dir is not exist:$MODEL_PATH"
exit 1
fi
else
echo "input path is not file and not dir:$MODEL_PATH"
exit 1
fi
# create modelfile
MODEL_FILE="Modelfile"
cat <<EOF > $MODEL_FILE
FROM "$MODEL_PATH"
EOF
echo "create modelfile success"
cat $MODEL_FILE
# run Ollama create
if [ -z "$2" ]; then
echo "please input model name"
exit 1
fi
MODEL_PATH=$2
ollama create $MODEL_PATH -f Modelfile
if [ $? -ne 0 ]; then
echo "run ollama create occur error"
exit 1
fi
# run Ollama run
ollama run mymodel
# ensure the shell is not exit,avoid the process exit
while true; do
sleep 3600
done Let's see the result,Here we take the guff file mounting as an example. In order to start faster, we use the minimized image alpine/ollala:latest
{{- if .Values.backendRuntime.install -}}
apiVersion: inference.llmaz.io/v1alpha1
kind: BackendRuntime
metadata:
labels:
app.kubernetes.io/name: backendruntime
app.kubernetes.io/part-of: llmaz
app.kubernetes.io/created-by: llmaz
name: ollama
spec:
commands:
- sh
- /workspace/models/llmaz-scripts/start_ollama.sh
image: alpine/ollama
version: latest
# Do not edit the preset argument name unless you know what you're doing.
# Free to add more arguments with your requirements.
args:
- name: default
flags:
- "{{`{{ .ModelPath }}`}}"
- "{{`{{ .ModelName }}`}}"
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 4
memory: 8Gi
{{- end }}
Let us to port-forwrd the 11434 to 8080: so, this is my idea to support ollama. i wolud like to know more idea to support elegant. PTAL @kerthcet |
Thanks for the detailed information, it's really clear. Based on the fact that ollama is mostly designed for local deploy, but not for cloud, and it's based on llama.cpp, we already supported that, so my suggestion is let's start with the simplest approach and see whether this is popular with users, then step to next level based on feedbacks, rather than make it a perfect one at day1. So maybe we can start with Again, from what I've learned so far, I didn't see a lot of users deploy ollama in the cloud, this is a suboptimal solution, just because we can easy to integrate with inference backends, so I make it a TODO work. wdyt? |
Thanks for your kind reply. I figured out what had confused me so much, which was that I thought the modelfile was mandatory. in additional , I have no idea how to ignore the modelfiles. for example, we can add a Ignore field in plyaground, when ignore is true, we can only run playground not binding model? but, playground, service and backendruntime controller have a lot of code binding model and model[0], there will be many work to ignore the model. do you have any suggestions for implementation? |
A simple implementation would like:
Any suggestions? |
I fully agree, this seems like the least invasive solution. I'll work on getting it done as soon as possible |
Could we close this issue?@kerthcet |
Yes, we can. One tip, you can set the PR description like /close |
What would you like to be added:
ollama provides sdk for integrations, we can easily integrate with it, one of the benefits I can think of is ollama maintains a bunch of quantized models, we can leverage.
Why is this needed:
Ecosystem integration.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: