-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A metrics collector for Kubeflow Pipeline Metrics artifacts #2019
Comments
I think I may give this a go - I would try to build this in Python analogous to the tfevent-metricscollector. |
Small update: What I am failing is to pass the current trial name to the custom connector in the My cli metricscollector takes an argument "-t" or "--trial_name" with the trial name to use for reporting (exactly as the tfevent-metricscollector). |
I am now really a bit confused:
Yet looking at the pods Katib creates, all these arguments seem to be missing. My current section to specify the metrics collector:
Which creates a specification as:
|
Closesly modelled after the tfevent-metricscollector. Currently not yet working, as there are issues that the arguments from the `injector_webhoook` are somehow not passed. Addresses: kubeflow#2019
This uses the new custom KFP V1 metrics collector that can directly extract metrics from Kubeflow Pipeline metrics. With this collector, to measure metrics of a Kubeflow pipeline only requires to a) add the label which step is the `model-training` b) diseable caching for this step c) configure the katib metrics collector. Also all the information is added now, such that the Katib pipeline can be run via the KatibClient. Addresses: - kubeflow/katib#1914 - kubeflow/katib#2019 k
I now managed to implement a working metrics collector for Kubeflow Pipeline V1 Metrics artifacts: For a full example how this is used see: https://github.com/votti/katib-exploration/blob/main/notebooks/mnist_pipeline_v1.ipynb @Push: I think it is an interesting idea to build a dedicated KubeflowPipeline component that can push metrics to |
This example illustrates how a full kfp pipeline can be tuned using Katib. It is based on a metrics collector to collect kubeflow pipeline metrics (kubeflow#2019). This is used as a Custom Collector. Addresses: kubeflow#1914, kubeflow#2019
Closesly modelled after the tfevent-metricscollector. Currently not yet working, as there are issues that the arguments from the `injector_webhoook` are somehow not passed. Addresses: kubeflow#2019
This example illustrates how a full kfp pipeline can be tuned using Katib. It is based on a metrics collector to collect kubeflow pipeline metrics (kubeflow#2019). This is used as a Custom Collector. Addresses: kubeflow#1914, kubeflow#2019
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello, any update for KFP v2? |
@AlexandreBrown We've worked on Katib + KFP example in this PR: #2118 |
Great to see progress, was this PR made for kfp v2 or only v1? |
That PR is only for v1. |
@AlexandreBrown |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
/kind feature
Describe the solution you'd like
Currently a aim is to do parameter tuning over pipelines in katib (#1914, #1993).
Kubeflow pipelines allow for dedicated metrics artifacts:
https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html?h=metrics#kfp.dsl.Metrics
https://www.kubeflow.org/docs/components/pipelines/v1/sdk/pipelines-metrics/
Having a dedicated Katib sidecar metrics collector that collects the metrics from this artifacts, would make pipelines and katib work together quite nicely.
The current workaround is to use the stdout collector, but this causes issues with the complex commands in pipeline components (#1914, will add dedicated issue soon).
Anything else you would like to add:
Love this feature? Give it a 👍 We prioritize the features with the most 👍
The text was updated successfully, but these errors were encountered: