-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HPA support to ChatQnA #327
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dependency to Prometheus is needed for metrics monitoring.
Because HPA overwrites / changes several things, it might be better to have it as a separate PR?
RBAC rules installed by Prometheus allow it to query metrics only from services in So asking Helm to install ChatQnA to some other namespace would mean there being no metrics for HPA. Would be good to mention that somewhere, maybe in the ChatQnA helm-chart README? |
helm-charts/common/embedding-usvc/templates/servicemonitor.yaml
Outdated
Show resolved
Hide resolved
helm-charts/common/embedding-usvc/templates/servicemonitor.yaml
Outdated
Show resolved
Hide resolved
Fixed comments, also made ServiceMonitor take name from the Values like Service has it. |
258d387
to
24e1a63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comment update proposals.
helm-charts/common/embedding-usvc/templates/horizontalPodAutoscaler.yaml
Outdated
Show resolved
Hide resolved
helm-charts/common/reranking-usvc/templates/horizontalPodAutoscaler.yaml
Outdated
Show resolved
Hide resolved
Added all suggested comments. |
Something like this could be added to relevant Chart READMEs, e.g. between their Install and Verify sections:
|
Added HPA sections to |
Bit more work required for the README:
Verify HPA metricsTo verify that metrics required by Prometheus found the metric endpoints, i.e. last number on the line is non-zero:
Prometheus adapter provides custom metrics for their data:
And those custom metrics have valid values for HPA rules:
NOTE: HuggingFace TGI and TEI services provide metrics endpoint only after they've processed their first request! |
If we don't care that user cannot directly copy-paste the command, last check could be just:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helm charts helpers unconditionally add a selector label for deployments, so ServiceMonitor
s can use that instead of new svc
label.
helm-charts/common/embedding-usvc/templates/servicemonitor.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPEA has changed the service names so *-svc
pattern cannot be used to match the relevant one any more, each service needs its own grep pattern for validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides helm-chart, Would you also update the manifests( https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/config/manifests) to accomodate this change?
@irisdingbj As HPA support is disabled by default, running If you meant generating additional set of manifest files for HPA, I think that's a bad idea. User will then miss Pre-conditions and Gotchas documented in the Helm charts READMEs, and does not have options to configure HPA for underlying cloud setup (e.g. to how many replicas each deployment can be scaled, which depends on how many nodes are available). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowadays Helm charts Service
declarations use service's name also for port names.
As port name is hard-coded in Service
s, I'm suggesting same with ServiceMonitor
s, but I think both could as well switch to using Helm include
instead...
helm-charts/common/embedding-usvc/templates/servicemonitor.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came up with IHMO slightly more readable command line example than what I had earlier. But earlier works fine too.
Added suggested fixes, rebased onto latest main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but please also update helm-charts/chatqna/README.md
as suggested above.
I'll also update generated manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing E2E test seems to be network issue:
docker push 100.80.243.74:5000/opea/gmcmanager:84507fc42fbae6a2104ce36457d8ddc5b02c4354
The push refers to repository [100.80.243.74:5000/opea/gmcmanager]
...
337cf9c1bd1f: Retrying in 1 second
received unexpected HTTP status: 500 Internal Server Error
Signed-off-by: Alexey Fomenko <[email protected]>
for more information, see https://pre-commit.ci
helm-charts/chatqna/README.md
Outdated
@@ -34,6 +34,35 @@ helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -- | |||
|
|||
1. Make sure your `MODELDIR` exists on the node where your workload is schedueled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model. | |||
|
|||
## HorizontalPodAutoscaler (HPA) support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This HPA support section is generic enough. I think maybe we can put it in one place instead of copy/pasting it:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks, @eero-t for preparing patch.
@irisdingbj can you approve this and add the v0.9 label. Perhaps merge as well since the automatic merge does not happen since the e2e test fails due to network error in GMC. |
I squashed whitespace commit from CI into previous commit and force-pushed to trigger new test round. |
Signed-off-by: Alexey Fomenko <[email protected]>
All test passed now and merged into main branch. v0.9 label already added. Please ask @daisy-ycguo for the process to merge into v0.9 release. |
Description
This PR introduces HPA support to ChatQnA TGI, Embedding and Reranking services based on custom metrics.