Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMC: Add GPU support for GMC. #292

Merged
merged 2 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions microservices-connector/config/samples/chatQnA_nv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
app.kubernetes.io/managed-by: kustomize
gmc/platform: nvidia
name: chatqa
namespace: chatqa
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Embedding
internalService:
serviceName: embedding-svc
config:
endpoint: /v1/embeddings
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-svc
isDownstreamService: true
- name: Retriever
data: $response
internalService:
serviceName: retriever-svc
config:
endpoint: /v1/retrieval
REDIS_URL: redis-vector-db
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc
- name: VectorDB
internalService:
serviceName: redis-vector-db
isDownstreamService: true
- name: Reranking
data: $response
internalService:
serviceName: reranking-svc
config:
endpoint: /v1/reranking
TEI_RERANKING_ENDPOINT: tei-reranking-svc
- name: TeiReranking
internalService:
serviceName: tei-reranking-svc
config:
endpoint: /rerank
isDownstreamService: true
- name: Llm
data: $response
internalService:
serviceName: llm-svc
config:
endpoint: /v1/chat/completions
TGI_LLM_ENDPOINT: tgi-service-m
- name: TgiNvidia
internalService:
serviceName: tgi-service-m
config:
endpoint: /generate
isDownstreamService: true
124 changes: 124 additions & 0 deletions microservices-connector/config/samples/chatQnA_switch_nv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
app.kubernetes.io/managed-by: kustomize
gmc/platform: nvidia
name: switch
namespace: switch
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Embedding
nodeName: node1
- name: Reranking
data: $response
internalService:
serviceName: reranking-svc
config:
endpoint: /v1/reranking
TEI_RERANKING_ENDPOINT: tei-reranking-svc
- name: TeiReranking
internalService:
serviceName: tei-reranking-svc
config:
endpoint: /rerank
isDownstreamService: true
- name: Llm
data: $response
nodeName: node2
node1:
routerType: Switch
steps:
- name: Embedding
condition: embedding-model-id==large
internalService:
serviceName: embedding-svc-large
config:
endpoint: /v1/embeddings
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge15
- name: Embedding
condition: embedding-model-id==small
internalService:
serviceName: embedding-svc-small
config:
endpoint: /v1/embeddings
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge-small
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-svc-bge15
config:
MODEL_ID: BAAI/bge-base-en-v1.5
isDownstreamService: true
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-svc-bge-small
config:
MODEL_ID: BAAI/bge-base-en-v1.5
isDownstreamService: true
- name: Retriever
condition: embedding-model-id==large
data: $response
internalService:
serviceName: retriever-svc-large
config:
endpoint: /v1/retrieval
REDIS_URL: redis-vector-db-large
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge15
- name: Retriever
condition: embedding-model-id==small
data: $response
internalService:
serviceName: retriever-svc-small
config:
endpoint: /v1/retrieval
REDIS_URL: redis-vector-db-small
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge-small
- name: VectorDB
internalService:
serviceName: redis-vector-db-large
isDownstreamService: true
- name: VectorDB
internalService:
serviceName: redis-vector-db-small
isDownstreamService: true
node2:
routerType: Switch
steps:
- name: Llm
condition: model-id==intel
internalService:
serviceName: llm-svc-intel
config:
endpoint: /v1/chat/completions
TGI_LLM_ENDPOINT: tgi-service-intel
- name: Llm
condition: model-id==llama
internalService:
serviceName: llm-svc-llama
config:
endpoint: /v1/chat/completions
TGI_LLM_ENDPOINT: tgi-service-llama
- name: TgiNvidia
internalService:
serviceName: tgi-service-intel
config:
endpoint: /generate
MODEL_ID: Intel/neural-chat-7b-v3-3
isDownstreamService: true
- name: TgiNvidia
internalService:
serviceName: tgi-service-llama
config:
endpoint: /generate
MODEL_ID: bigscience/bloom-560m
isDownstreamService: true
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,14 @@ const (
TeiReranking = "TeiReranking"
Tgi = "Tgi"
TgiGaudi = "TgiGaudi"
TgiNvidia = "TgiNvidia"
Llm = "Llm"
DocSum = "DocSum"
Router = "router"
DataPrep = "DataPrep"
xeon = "xeon"
gaudi = "gaudi"
nvidia = "nvidia"
WebRetriever = "WebRetriever"
yaml_dir = "/tmp/microservices/yamls/"
Service = "Service"
Expand All @@ -76,6 +78,7 @@ var yamlDict = map[string]string{
TeiReranking: yaml_dir + "teirerank.yaml",
Tgi: yaml_dir + "tgi.yaml",
TgiGaudi: yaml_dir + "tgi_gaudi.yaml",
TgiNvidia: yaml_dir + "tgi_nv.yaml",
Llm: yaml_dir + "llm-uservice.yaml",
DocSum: yaml_dir + "docsum-llm-uservice.yaml",
Router: yaml_dir + "gmc-router.yaml",
Expand Down
6 changes: 5 additions & 1 deletion microservices-connector/usage_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ A sample for chatQnA can be found at config/samples/chatQnA_xeon.yaml
```sh
kubectl create ns chatqa
kubectl apply -f $(pwd)/config/samples/chatQnA_xeon.yaml
# To use Gaudi device
#kubectl apply -f $(pwd)/config/samples/chatQnA_gaudi.yaml
# To use Nvidia GPU
#kubectl apply -f $(pwd)/config/samples/chatQnA_nv.yaml
```

**GMC will reconcile chatQnA custom resource and get all related components/services ready**
Expand All @@ -39,7 +43,7 @@ kubectl create deployment client-test -n chatqa --image=python:3.8.13 -- sleep i
**Access the pipeline using the above URL from the client pod**

```bash
export CLIENT_POD=$(kubectl get pod -l app=client-test -o jsonpath={.items..metadata.name})
export CLIENT_POD=$(kubectl get pod -n chatqa -l app=client-test -o jsonpath={.items..metadata.name})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

export accessUrl=$(kubectl get gmc -n chatqa -o jsonpath="{.items[?(@.metadata.name=='chatqa')].status.accessUrl}")
kubectl exec "$CLIENT_POD" -n chatqa -- curl $accessUrl -X POST -d '{"text":"What is the revenue of Nike in 2023?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'
```
Expand Down
Loading