URN (Uniform Resource Name) is the chosen scheme of URI to uniquely define any resource in DataHub. It has the following form
urn:<Namespace>:<Entity Type>:<ID>
Onboarding a new entity to GMA starts with modelling an URN specific to that entity. You can use the existing URN models for built-in entities as a reference.
All URNs available in DataHub are using li
as their namespace.
This can be easily changed to a different namespace for your organization if you fork DataHub.
Entity type for URN is different than entity in GMA context. This can be thought of as the object type of
any resource for which you need unique identifier for its each instance. While you can create URNs for GMA entities such as
[DatasetUrn] with entity type dataset
, you can also define URN for data platforms, [DataPlatformUrn].
ID is the unique identifier part of a URN. It's unique for a specific entity type within a specific namespace. ID could contain a single field, or multi fields in the case of complex URNs. A complex URN can even contain other URNs as ID fields. This type of URN is also referred to as nested URN. For non-URN ID fields, the value can be either a string, number, or Pegasus Enum.
Here are some example URNs with a single ID field:
urn:li:dataPlatform:kafka
urn:li:corpuser:jdoe
DatasetUrn is an example of a complex nested URN. It contains 3 ID fields: platform
, name
and fabric
, where platform
is another URN. Here are some examples
urn:li:dataset:(urn:li:dataPlatform:kafka,PageViewEvent,PROD)
urn:li:dataset:(urn:li:dataPlatform:hdfs,PageViewEvent,EI)