Skip to content

Commit

Permalink
Merge branch 'datahub-project:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
anshbansal authored Aug 2, 2024
2 parents 1ad6aad + f2e461e commit e86bca2
Show file tree
Hide file tree
Showing 7 changed files with 44 additions and 16 deletions.
16 changes: 6 additions & 10 deletions docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ There's two actively supported implementations of the plugin, with different Air

| Approach | Airflow Version | Notes |
| --------- | --------------- | --------------------------------------------------------------------------- |
| Plugin v2 | 2.3+ | Recommended. Requires Python 3.8+ |
| Plugin v2 | 2.3.4+ | Recommended. Requires Python 3.8+ |
| Plugin v1 | 2.1+ | No automatic lineage extraction; may not extract lineage if the task fails. |

If you're using Airflow older than 2.1, it's possible to use the v1 plugin with older versions of `acryl-datahub-airflow-plugin`. See the [compatibility section](#compatibility) for more details.
Expand Down Expand Up @@ -66,7 +66,7 @@ enabled = True # default
```

| Name | Default value | Description |
|----------------------------|----------------------|------------------------------------------------------------------------------------------|
| -------------------------- | -------------------- | ---------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub rest connection. |
| cluster | prod | name of the airflow cluster, this is equivalent to the `env` of the instance |
Expand Down Expand Up @@ -132,7 +132,7 @@ conn_id = datahub_rest_default # or datahub_kafka_default
```

| Name | Default value | Description |
|----------------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub connection you set in step 1. |
| cluster | prod | name of the airflow cluster |
Expand Down Expand Up @@ -240,6 +240,7 @@ See this [example PR](https://github.com/datahub-project/datahub/pull/10452) whi
There might be a case where the DAGs are removed from the Airflow but the corresponding pipelines and tasks are still there in the Datahub, let's call such pipelines ans tasks, `obsolete pipelines and tasks`

Following are the steps to cleanup them from the datahub:

- create a DAG named `Datahub_Cleanup`, i.e.

```python
Expand All @@ -263,8 +264,8 @@ with DAG(
)

```
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`

- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`

## Get all dataJobs associated with a dataFlow

Expand All @@ -274,12 +275,7 @@ If you are looking to find all tasks (aka DataJobs) that belong to a specific pi
query {
dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") {
childJobs: relationships(
input: {
types: ["IsPartOf"],
direction: INCOMING,
start: 0,
count: 100
}
input: { types: ["IsPartOf"], direction: INCOMING, start: 0, count: 100 }
) {
total
relationships {
Expand Down
8 changes: 8 additions & 0 deletions docs/quick-ingestion-guides/tableau/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,14 @@ In order to configure ingestion from Tableau, you'll first have to enable Tablea
- Open a command prompt as an admin on the initial node (*where TSM is installed*) in the cluster
- Run the command: `tsm maintenance metadata-services enable`

3. **Enable Derived Permissions:** This step is required only when the site is using external assets. For more detail, refer to the tableau documentation [Manage Permissions for External Assets](https://help.tableau.com/current/online/en-us/dm_perms_assets.htm).

Follow the below steps to enable the derived permissions:

- Sign in to Tableau Cloud or Tableau Server as an admin.
- From the left navigation pane, click Settings.
- On the General tab, under Automatic Access to Metadata about Databases and Tables, select the `Automatically grant authorized users access to metadata about databases and tables` check box.


## Next Steps

Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion-modules/airflow-plugin/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def get_long_description():
mypy_stubs = {
"types-dataclasses",
"sqlalchemy-stubs",
"types-pkg_resources",
"types-setuptools",
"types-six",
"types-python-dateutil",
"types-requests",
Expand Down
4 changes: 2 additions & 2 deletions metadata-ingestion-modules/dagster-plugin/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ def get_long_description():
"dagit >= 1.3.3",
*rest_common,
# Ignoring the dependency below because it causes issues with the vercel built wheel install
#f"acryl-datahub[datahub-rest]{_self_pin}",
# f"acryl-datahub[datahub-rest]{_self_pin}",
"acryl-datahub[datahub-rest]",
}

mypy_stubs = {
"types-dataclasses",
"sqlalchemy-stubs",
"types-pkg_resources",
"types-setuptools",
"types-six",
"types-python-dateutil",
"types-requests",
Expand Down
9 changes: 9 additions & 0 deletions metadata-ingestion/docs/sources/tableau/tableau_pre.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,12 @@ This may happen when the Tableau API returns NODE_LIMIT_EXCEEDED error in respon

- reducing the page size using the `page_size` config param in datahub recipe (Defaults to 10).
- increasing tableau configuration [metadata query node limit](https://help.tableau.com/current/server/en-us/cli_configuration-set_tsm.htm#metadata_nodelimit) to higher value.

### `PERMISSIONS_MODE_SWITCHED` error in ingestion report
This error occurs if the Tableau site is using external assets. For more detail, refer to the Tableau documentation [Manage Permissions for External Assets](https://help.tableau.com/current/online/en-us/dm_perms_assets.htm).

Follow the below steps to enable the derived permissions:

1. Sign in to Tableau Cloud or Tableau Server as an admin.
2. From the left navigation pane, click Settings.
3. On the General tab, under Automatic Access to Metadata about Databases and Tables, select the `Automatically grant authorized users access to metadata about databases and tables` check box.
2 changes: 1 addition & 1 deletion metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -491,7 +491,7 @@

mypy_stubs = {
"types-dataclasses",
"types-pkg_resources",
"types-setuptools",
"types-six",
"types-python-dateutil",
# We need to avoid 2.31.0.5 and 2.31.0.4 due to
Expand Down
19 changes: 17 additions & 2 deletions metadata-ingestion/src/datahub/ingestion/source/mode.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,14 @@ class ModeConfig(StatefulIngestionConfigBase, DatasetLineageProviderConfigBase):
connect_uri: str = Field(
default="https://app.mode.com", description="Mode host URL."
)
token: str = Field(description="Mode user token.")
token: str = Field(
description="When creating workspace API key this is the 'Key ID'."
)
password: pydantic.SecretStr = Field(
description="Mode password for authentication."
description="When creating workspace API key this is the 'Secret'."
)
exclude_restricted: bool = Field(
default=False, description="Exclude restricted collections"
)

workspace: str = Field(
Expand Down Expand Up @@ -522,6 +527,16 @@ def _get_space_name_and_tokens(self) -> dict:
for s in spaces:
logger.debug(f"Space: {s.get('name')}")
space_name = s.get("name", "")
# Using both restricted and default_access_level because
# there is a current bug with restricted returning False everytime
# which has been reported to Mode team
if self.config.exclude_restricted and (
s.get("restricted") or s.get("default_access_level") == "restricted"
):
logging.debug(
f"Skipping space {space_name} due to exclude restricted"
)
continue
if not self.config.space_pattern.allowed(space_name):
self.report.report_dropped_space(space_name)
logging.debug(f"Skipping space {space_name} due to space pattern")
Expand Down

0 comments on commit e86bca2

Please sign in to comment.