Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py: add basic service URL resolver #421

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

isinyaaa
Copy link
Contributor

@isinyaaa isinyaaa commented Sep 23, 2024

Description

Adds a simple constructor that resolves cluster service URLs.

How Has This Been Tested?

Tested with the latest UI using a MR deployed directly through the settings panel.

The client then connects:

from model_registry import ModelRegistry

mr = ModelRegistry.from_service("modelregistry-sample", "isinyaaa")

and creates a new MV:

mr.register_model("test", "s3://catopia/meow.onnx", model_format_name="oooo", model_format_version="v1", version="123")

❗ IMPORTANT: to make this work with ODH deployments you need to create a clusterwide rolebinding for the current DSP service accounts, with the odh-dashboard role.

Change was reflected on the UI as expected:

image

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

If you have UI changes

  • The developer has added tests or explained why testing cannot be added.
  • Included any necessary screenshots or gifs if it was a UI change.
  • Verify that UI/UX changes conform the UX guidelines for Kubeflow.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tomcli for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@isinyaaa
Copy link
Contributor Author

cc: @dhirajsb @tarilabs @Al-Pragliola

@isinyaaa isinyaaa marked this pull request as ready for review September 23, 2024 20:35
@google-oss-prow google-oss-prow bot requested a review from Tomcli September 23, 2024 20:35
Copy link
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @isinyaaa , some early 👀 for your considerations

clients/python/src/model_registry/_client.py Outdated Show resolved Hide resolved
clients/python/src/model_registry/_client.py Outdated Show resolved Hide resolved
clients/python/src/model_registry/_client.py Outdated Show resolved Hide resolved
@tarilabs
Copy link
Member

@Al-Pragliola any thoughts on potential amendments which would not require a rolebinding or anyway any changes to the permission settings?

@Al-Pragliola
Copy link
Contributor

Al-Pragliola commented Sep 24, 2024

@Al-Pragliola any thoughts on potential amendments which would not require a rolebinding or anyway any changes to the permission settings?

mm no, I can't think of another way to let the notebook pod interact with the k8s api-server, but I think the odh-dashboard role is a bit too permissive, we should have an ad-hoc role with only the required permissions

@tarilabs
Copy link
Member

Thanks @Al-Pragliola , adding to #421 (comment), what about having some "fallbacks" in case that permission is not there? Do we believe we might have some options?

For example, attemping in case a list K8s resources in a set of known to be used namespaces, may require simpler/lower permissions? for example, listing MR CR instances in a set of known namespaces in case this require lower permission, or listing Services again in known namespaces, etc?

@isinyaaa
Copy link
Contributor Author

isinyaaa commented Sep 24, 2024

@tarilabs if we can't get DSC it falls back to looking for the instance in the specified namespace, or the default (kubeflow for upstream, should be changed to odh-model-registries when merging midstream)

@tarilabs
Copy link
Member

should be changed to odh-model-registries when merging midstream

I don't think that is viable, since there will be only 1 "upstream" pypi of the MR python client?

Hence why I believe would have been preferable to adopt a fallback(s) approach

@isinyaaa isinyaaa force-pushed the push-uzukprvowwly branch 4 times, most recently from 7f65be5 to f049130 Compare September 25, 2024 15:04
@rareddy
Copy link
Contributor

rareddy commented Sep 26, 2024

IMO the Python library should not use any OpenShift calls directly. That brings in additional dependency on where the MR deployed into a client library. I suggest we feed this information through some kind of config or env to keep it simple, then think through how this config can be supplied. I am thinking like .kubeconfig wdyt?

@dhirajsb
Copy link
Contributor

Yes, the DSC namespace discovery is an odh/rhoai feature, since there is no equivalent place to discover registries namespace in kubeflow. Although, the default MR manifests kind of make kubeflow namespace the default registry namespace.

We should keep things simple and allow this capability to discover the presence of DSC CR in the client library. The Python client can simply ignore that service discovery path if DSC resource type is not present and try to lookup registries using the hostname provided by users.

If we want to allow Python API to be service location agnostic, another way to do that would be by supporting an env variable for a default MR namespace, and also support specifying a direct service host. That way it can be also be injected into the notebook from a user supplied configmap (which could be cluster/env specific) if needed.

@dhirajsb
Copy link
Contributor

BTW, the Python client can do this without binding to OpenShift specific APIs using the K8s unstructured generic API.

@dhirajsb
Copy link
Contributor

The odh-dashboard role referenced by @isinyaaa is a workaround, not a requirement once Notebooks grants the right permissions midstream. That's being worked on by the odh notebooks team.

@rareddy
Copy link
Contributor

rareddy commented Oct 1, 2024

Apologies, when I mentioned OpenShift in general I was referring to Kube. saying in general Python client should not make any assumptions on where the MR is hosted IMO. Thinking more in terms of what you said above

If we want to allow Python API to be service location agnostic, another way to do that would be by supporting an env variable for a default MR namespace, and also support specifying a direct service host. That way it can be also be injected into the notebook from a user supplied configmap (which could be cluster/env specific) if needed.

then make some other automation happen to provide the config/env based on environment.

@rareddy
Copy link
Contributor

rareddy commented Oct 1, 2024

similar pattern by MLFlow https://mlflow.org/docs/latest/model-registry.html#id10

@tarilabs tarilabs mentioned this pull request Oct 4, 2024
7 tasks
@tarilabs
Copy link
Member

tarilabs commented Oct 4, 2024

similar pattern by MLFlow https://mlflow.org/docs/latest/model-registry.html#id10

I believe this is the link you wanted: https://mlflow.org/docs/latest/model-registry.html#databricks-unity-catalog-model-registry (the original link is an entry in a TOC)

Sounds nice and following up from #447 (comment) , what about using something analogous to Viper on the Go side, like https://pypi.org/project/12factor-configclasses/ that would support said injection?

This way:

  • for upstream KF, we can provide them in the Manifest for example
  • for midstream, folks can rely on their own Operators making the env or CM

Signed-off-by: Isabella do Amaral <[email protected]>
Signed-off-by: Isabella do Amaral <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants