GenAI Zurich Workshop

Build a RAG with Prem Platform

Currently the platform is not open-source, but we plan to release it open-source in the future.

Upload the documents in a Prem Repository
Test the Assistant from the Lab
Launch the Best Model and configurations
Integrate Prem in your Product with our official SDKs

We prepared a script in order to interact with the assistant we created through Prem Platform.

cd ./rag_demo
python chat_saas.py

Deploy a Model in your Infrastructure with Prem Operator

Create a Paperspace instance, install Kubernetes (K3s) and various utilities

You can install the utilities (k3sup, kubectl, helm, etc.) on your local machine if you prefer. Below we show installing everything on the remote machine while logged in via SSH

# Install k3sup
curl -sLS https://get.k3sup.dev | sh
sudo install k3sup /usr/local/bin

# Install k3s
# NOTE: Replace --local with the --ip/--host to install k3s remotely
k3sup install --local --local-path $HOME/.kube/config

# Install kubectl
sudo apt update
sudo snap install kubectl --classic
kubectl version --client

# Check the cluster is configured
kubectl get nodes
# NOTE: if this doesn't return a node, you can also try the following
sudo k3s kubectl get nodes

# Install Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod +x get_helm.sh
./get_helm.sh
helm version

# Optionally install k9s as a TUI alternative to kubectl
curl -OL https://github.com/derailed/k9s/releases/download/v0.32.4/k9s_linux_amd64.deb
sudo apt install ./k9s_linux_amd64.deb

Install the NVIDIA Operator

# Remove any pre-installed NVIDIA drivers and container runtime
sudo apt purge cuda-*
sudo apt purge nvidia-*
sudo apt autoremove
sudo apt autoclean

# Downgrade the Linux kernel because of https://forums.developer.nvidia.com/t/nvidia-modeset-unknown-symbol-on-module-load-error/239848
sudo apt install linux-image-5.15.0-107-generic
sudo apt remove linux-image-generic-hwe-22.04
sudo apt remove linux-image-6.2.0-37-generic
# Reboot to load the new kernel
sudo systemctl reboot

# Install the NVIDIA operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait --generate-name -n gpu-operator --create-namespace --set driver.version=550-5.15.0-107 --set driver.usePrecompiled=true nvidia/gpu-operator

# wait for the driver to be installed, this is where k9s is useful...
# if the NVIDIA pods for the driver and runtime succeed, but the other pods are stuck in init then restart
sudo systemctl reboot

Clone the prem-operator repository

git clone https://github.com/premAI-io/prem-operator.git

Install Prem Operator

helm install latest oci://registry-1.docker.io/premai/prem-operator-chart

Deploy the Phi-2 + Elia CPU example and exec Elia in a terminal session

cd ./prem-operator
kubectl apply -f examples/llama3-8b-gguf.yaml
# Grab a coffee while the container downloads
kubectl exec -it deployments/llama-3-tui -- elia
# Type a message and wait a little more for the model to load

Remove the example deployment and Prem Operator

kubectl delete aideployment/llama-3-8b-gguf
kubectl delete aimodelmap/llama-3-8b-gguf
kubectl delete deployment/llama-3-cli
kubectl delete deployment/llama-3-tui
helm uninstall latest

Build a RAG with Prem 1B Chat

Go to rag_demo folder

cd rag_demo

Install the requirements:

pip install -r requirements.txt

Start jupyter lab and run the noteboook:

jupyter lab --allow-root

Some questions to try out

What is the key feature of ChatEval compared to the other evaluation strategies?

ChatEval is a multi-agent system employed for evaluation of LLMs. Where each agent represents a different persona (achieved through role prompts). This is essential in the multi-agent debate process, improving the evaluator's performance.

To train the evaluator 'Prometheus 2', the authors introduced a new dataset called 'preference collection'. What are the key features of this dataset?

PREFERENCE COLLECTION: the first pairwise ranking dataset that includes over 1,000 instance-wise evaluation criteria beyond basic qualities such as helpfulness and harmlessness.

How does the Infini-attention technique aim to address the problem related to limited context in generative models?

The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
rag_demo		rag_demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Zurich Workshop

Build a RAG with Prem Platform

Deploy a Model in your Infrastructure with Prem Operator

Build a RAG with Prem 1B Chat

Some questions to try out

About

Releases

Packages

Contributors 5

Languages

License

premAI-io/genai-zurich-workshop

Folders and files

Latest commit

History

Repository files navigation

GenAI Zurich Workshop

Build a RAG with Prem Platform

Deploy a Model in your Infrastructure with Prem Operator

Build a RAG with Prem 1B Chat

Some questions to try out

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages