Skip to content

Commit

Permalink
Add dbt docs natively in Airflow via plugin (#737)
Browse files Browse the repository at this point in the history
## Description

This PR adds a plugin (via the Airflow plugins entrypoint) that adds a
menu item inside of `Browse` that renders the dbt docs:


![image](https://github.com/astronomer/astronomer-cosmos/assets/31971762/77b5e8d6-ada5-484c-b463-a01352ab61f6)

And this is what it looks like. (This example is inside the dev docker
compose):

<img width="1627" alt="image"
src="https://github.com/astronomer/astronomer-cosmos/assets/31971762/387bd139-cd3f-4e57-90e6-aea7c426092d">

The docs are rendered via an iframe with some additional hacks to make
the page render in a user friendly way. I chose an iframe over vendoring
the `index.html` in the templates for a few reasons, but mostly to
support custom `{% block __overview__ %}` text. However, extracting the
text from `index.html` and rendering it in a custom page is certainly an
option too.

The dbt docs are specified in the Airflow config with the following
parameters:

```ini
[cosmos]
dbt_docs_dir = path/to/docs/here
dbt_docs_conn_id = my_conn_id
```

Note that the path can be a link to any of the following:

- S3
- Azure Blob Storage
- Google Cloud Storage
- HTTP/HTTPS
- Local storage

This is designed to work with the operators that dump the dbt docs, and
the documentation changes I added make that clear.

Lastly, if docs are not hooked up, a message comes up telling the user
that they should set their dbt docs up:

<img width="816" alt="image"
src="https://github.com/astronomer/astronomer-cosmos/assets/31971762/b385275a-c618-46a1-b36d-1148c1b5706e">

### Current limitations

- Most importantly, **I need help testing the S3 / Azure / GCS
integrations.** I _think_ I got them right but I'll need someone to
actually try them.
- **I also wouldn't mind some help testing the UI on more browsers.**
I've tested both Firefox and Chrome.
- **The iframe hack is less than ideal; I would preferably want the dbt
docs to have a fixed height.** So instead of using the scroll bar of the
Airflow UI, use the scroll bar of the dbt docs UI. The issue is
basically that I am not an HTML/CSS/JavaScript person. I don't think
there is any reason this shouldn't be possible, so I can continue to
look into this as the PR is reviewed, or someone else can just do it for
me.
- I cannot run tests locally (lots of issues, mostly the databricks DAG
in `dev/dags/` fails locally), so I actually have no idea whether the
test suite works. I was just planning on letting Github Actions take a
stab at it.

### API Decisions

The core maintainers of the repo should provide some feedback on a few
high level API decisions:

- **Config variable names:** Let me know if `dbt_docs_dir` and
`dbt_docs_conn_id` are appropriate names. Other names could be like,
`dbt_docs_path`, or `dbt_docs_dir_conn_id`, or `dbt_docs_path_conn_id`,
etc.
- **Location in UI:** I entertained two ideas: (a) Adding a menu button
called Cosmos with dbt docs underneath. (b) Adding it under browse.
Ultimately I decided on option 2.

## Related Issue(s)

Closes #571.

## Breaking Change?

This PR should not cause any breaking changes.

## Checklist

- [x] I have made corresponding changes to the documentation (if
required)
- [x] I have added tests that prove my fix is effective or that my
feature works

---------

Co-authored-by: Tatiana Al-Chueyr <[email protected]>
Co-authored-by: Justin Bandoro <[email protected]>
  • Loading branch information
3 people authored Feb 20, 2024
1 parent 5995e6d commit 11ff713
Show file tree
Hide file tree
Showing 15 changed files with 603 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ repos:
types: [text]
args:
- --exclude-file=tests/sample/manifest_model_version.json
- --skip=**/manifest.json
- --skip=**/manifest.json,**.min.js
- -L connexion,aci
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0
Expand Down
202 changes: 202 additions & 0 deletions cosmos/plugin/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
import os.path as op
from typing import Any, Dict, Optional, Tuple
from urllib.parse import urlsplit

from airflow.configuration import conf
from airflow.plugins_manager import AirflowPlugin
from airflow.security import permissions
from airflow.www.auth import has_access
from airflow.www.views import AirflowBaseView
from flask import abort, url_for
from flask_appbuilder import AppBuilder, expose


def bucket_and_key(path: str) -> Tuple[str, str]:
parsed_url = urlsplit(path)
bucket = parsed_url.netloc
key = parsed_url.path.lstrip("/")
return bucket, key


def open_s3_file(conn_id: Optional[str], path: str) -> str:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook

if conn_id is None:
conn_id = S3Hook.default_conn_name

hook = S3Hook(aws_conn_id=conn_id)
bucket, key = bucket_and_key(path)
content = hook.read_key(key=key, bucket_name=bucket)
return content # type: ignore[no-any-return]


def open_gcs_file(conn_id: Optional[str], path: str) -> str:
from airflow.providers.google.cloud.hooks.gcs import GCSHook

if conn_id is None:
conn_id = GCSHook.default_conn_name

hook = GCSHook(gcp_conn_id=conn_id)
bucket, blob = bucket_and_key(path)
content = hook.download(bucket_name=bucket, object_name=blob)
return content.decode("utf-8") # type: ignore[no-any-return]


def open_azure_file(conn_id: Optional[str], path: str) -> str:
from airflow.providers.microsoft.azure.hooks.wasb import WasbHook

if conn_id is None:
conn_id = WasbHook.default_conn_name

hook = WasbHook(wasb_conn_id=conn_id)

container, blob = bucket_and_key(path)
content = hook.read_file(container_name=container, blob_name=blob)
return content # type: ignore[no-any-return]


def open_http_file(conn_id: Optional[str], path: str) -> str:
from airflow.providers.http.hooks.http import HttpHook

if conn_id is None:
conn_id = ""

hook = HttpHook(method="GET", http_conn_id=conn_id)
res = hook.run(endpoint=path)
hook.check_response(res)
return res.text # type: ignore[no-any-return]


def open_file(path: str) -> str:
"""Retrieve a file from http, https, gs, s3, or wasb."""
conn_id: Optional[str] = conf.get("cosmos", "dbt_docs_conn_id", fallback=None)

if path.strip().startswith("s3://"):
return open_s3_file(conn_id=conn_id, path=path)
elif path.strip().startswith("gs://"):
return open_gcs_file(conn_id=conn_id, path=path)
elif path.strip().startswith("wasb://"):
return open_azure_file(conn_id=conn_id, path=path)
elif path.strip().startswith("http://") or path.strip().startswith("https://"):
return open_http_file(conn_id=conn_id, path=path)
else:
with open(path) as f:
content = f.read()
return content # type: ignore[no-any-return]


iframe_script = """
<script>
function getMaxElement(side, elements_query) {
var elements = document.querySelectorAll(elements_query)
var elementsLength = elements.length,
elVal = 0,
maxVal = 0,
Side = capitalizeFirstLetter(side),
timer = Date.now()
for (var i = 0; i < elementsLength; i++) {
elVal =
elements[i].getBoundingClientRect()[side] +
getComputedStyleWrapper('margin' + Side, elements[i])
if (elVal > maxVal) {
maxVal = elVal
}
}
timer = Date.now() - timer
chkEventThottle(timer)
return maxVal
}
var throttledTimer = 16
function chkEventThottle(timer) {
if (timer > throttledTimer / 2) {
throttledTimer = 2 * timer
}
}
function capitalizeFirstLetter(string) {
return string.charAt(0).toUpperCase() + string.slice(1)
}
function getComputedStyleWrapper(prop, el) {
var retVal = 0
el = el || document.body // Not testable in phantonJS
retVal = document.defaultView.getComputedStyle(el, null)
retVal = null === retVal ? 0 : retVal[prop]
return parseInt(retVal)
}
window.iFrameResizer = {
heightCalculationMethod: function getHeight() {
return Math.max(
// Overview page
getMaxElement('bottom', 'div.panel.panel-default') + 50,
// Model page
getMaxElement('bottom', 'section.section') + 75,
// Search page
getMaxElement('bottom', 'div.result-body') + 110
)
}
}
</script>
"""


class DbtDocsView(AirflowBaseView):
default_view = "dbt_docs"
route_base = "/cosmos"
template_folder = op.join(op.dirname(__file__), "templates")
static_folder = op.join(op.dirname(__file__), "static")

def create_blueprint(
self, appbuilder: AppBuilder, endpoint: Optional[str] = None, static_folder: Optional[str] = None
) -> None:
# Make sure the static folder is not overwritten, as we want to use it.
return super().create_blueprint(appbuilder, endpoint=endpoint, static_folder=self.static_folder) # type: ignore[no-any-return]

@expose("/dbt_docs") # type: ignore[misc]
@has_access([(permissions.ACTION_CAN_READ, permissions.RESOURCE_WEBSITE)])
def dbt_docs(self) -> str:
if conf.get("cosmos", "dbt_docs_dir", fallback=None) is None:
return self.render_template("dbt_docs_not_set_up.html") # type: ignore[no-any-return,no-untyped-call]
return self.render_template("dbt_docs.html") # type: ignore[no-any-return,no-untyped-call]

@expose("/dbt_docs_index.html") # type: ignore[misc]
@has_access([(permissions.ACTION_CAN_READ, permissions.RESOURCE_WEBSITE)])
def dbt_docs_index(self) -> str:
docs_dir = conf.get("cosmos", "dbt_docs_dir", fallback=None)
if docs_dir is None:
abort(404)
html = open_file(op.join(docs_dir, "index.html"))
# Hack the dbt docs to render properly in an iframe
iframe_resizer_url = url_for(".static", filename="iframeResizer.contentWindow.min.js")
html = html.replace("</head>", f'{iframe_script}<script src="{iframe_resizer_url}"></script></head>', 1)
return html

@expose("/catalog.json") # type: ignore[misc]
@has_access([(permissions.ACTION_CAN_READ, permissions.RESOURCE_WEBSITE)])
def catalog(self) -> Tuple[str, int, Dict[str, Any]]:
docs_dir = conf.get("cosmos", "dbt_docs_dir", fallback=None)
if docs_dir is None:
abort(404)
data = open_file(op.join(docs_dir, "catalog.json"))
return data, 200, {"Content-Type": "application/json"}

@expose("/manifest.json") # type: ignore[misc]
@has_access([(permissions.ACTION_CAN_READ, permissions.RESOURCE_WEBSITE)])
def manifest(self) -> Tuple[str, int, Dict[str, Any]]:
docs_dir = conf.get("cosmos", "dbt_docs_dir", fallback=None)
if docs_dir is None:
abort(404)
data = open_file(op.join(docs_dir, "manifest.json"))
return data, 200, {"Content-Type": "application/json"}


dbt_docs_view = DbtDocsView()


class CosmosPlugin(AirflowPlugin):
name = "cosmos"
appbuilder_views = [{"name": "dbt Docs", "category": "Browse", "view": dbt_docs_view}]
Loading

0 comments on commit 11ff713

Please sign in to comment.