Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion): extend feast plugin to ingest tags and owners #11784

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

margaridafernandes-trip
Copy link

@margaridafernandes-trip margaridafernandes-trip commented Nov 4, 2024

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Nov 4, 2024
@@ -45,7 +45,7 @@ arm64_darwin_preflight() {
pip3 install --no-use-pep517 scipy
fi

brew_install "openssl@1.1"
brew_install "openssl@3.0.14"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question - Why this change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to upgrade the dependency version because the 1.1 version was deprecated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted to the previous version

@@ -1,4 +1,4 @@
#!/bin/bash -e
#!/bin/bash -e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this space change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if feature_view.tags.get("name"):
tag = feature_view.tags.get("name")
tag_association = TagAssociationClass(tag=builder.make_tag_urn(tag))
global_tags_aspect = GlobalTagsClass(tags=[tag_association])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note: If you've attached tags in DataHub, this will replace them by default :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to have Feast as our source of truth, so the fields must be defined in Feast repo.
Do you agree with this?

global_tags_aspect = GlobalTagsClass(tags=[tag_association])
aspects.append(global_tags_aspect)

if feature_view.owner:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note: If you've attached owners in DataHub UI, this will replace them by default :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

aspects = [StatusClass(removed=False)]

if entity.tags.get("name"):
tag: str = entity.tags.get("name")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor -

Any way to extract the owner and tags logic into a reusable method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already created a reusable method for tags and owner logic. Only thing missing is regarding the _owner_mapping (since we can only ingest the owner from Feast and we need to define the owner_type owner_ship_type_class, the solution i'm seeing is having a mapping to these values):

_owner_mapping: Dict[str, Dict[str, Any]] = { "Datahub": { "owner_type": builder.OwnerType.USER, "owner_ship_type_class": OwnershipTypeClass.DATAOWNER, } }

the issue here is having to specify in Datahub repo the specific use cases that each team might have (for example, in our case we will have an owner "MLOps") which might not be applicable to other teams.
Do you see any other possible solution?

@margaridafernandes-trip margaridafernandes-trip changed the title feat(ingestion): extend feast plugin to ingest tags for features feat(ingestion): extend feast plugin to ingest tags and owners Nov 8, 2024
@@ -360,12 +363,46 @@ def _get_on_demand_feature_view_workunit(

return MetadataWorkUnit(id=on_demand_feature_view_name, mce=mce)

def _get_tags_and_owners(self, obj: Union[Entity, FeatureView, FeastField]) -> list:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We written and defensive. Nice work!

# FIXME: Update to have more owners
_owner_mapping: Dict[str, Dict[str, Any]] = {
"Datahub": {
"owner_type": builder.OwnerType.USER,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does feast support GROUP owners as well? Thanks for calling this out!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can pass the owner as a "Person" (user) or "Group"
Screenshot 2024-11-08 at 17 00 44

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that from Feast side I can only specify the owner: str (so, for example "Margarida") and this should map to a OwnerType.USER and to the appropriate OwnershipTypeClass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok got it !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants