Skip to content

Commit

Permalink
docs(): Updating docs for assertions to correct databricks assertions…
Browse files Browse the repository at this point in the history
… support (#9713)

Co-authored-by: John Joyce <[email protected]>
  • Loading branch information
jjoyce0510 and John Joyce authored Jan 25, 2024
1 parent a78c689 commit caf6ebe
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 6 deletions.
2 changes: 1 addition & 1 deletion docs/managed-datahub/observe/custom-sql-assertions.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ The **Assertion Description**: This is a human-readable description of the Asser
### Prerequisites

1. **Permissions**: To create or delete Custom SQL Assertions for a specific entity on DataHub, you'll need to be granted the
`Edit Assertions` and `Edit Monitors` privileges for the entity. This is granted to Entity owners by default.
`Edit Assertions`, `Edit Monitors`, **and the additional `Edit SQL Assertion Monitors`** privileges for the entity. This is granted to Entity owners by default.

2. **Data Platform Connection**: In order to create a Custom SQL Assertion, you'll need to have an **Ingestion Source** configured to your
Data Platform: Snowflake, BigQuery, or Redshift under the **Integrations** tab.
Expand Down
15 changes: 10 additions & 5 deletions docs/managed-datahub/observe/freshness-assertions.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,14 @@ Change Source types vary by the platform, but generally fall into these categori

- **Audit Log** (Default): A metadata API or Table that is exposed by the Data Warehouse which contains captures information about the
operations that have been performed to each Table. It is usually efficient to check, but some useful operations are not
fully supported across all major Warehouse platforms.
fully supported across all major Warehouse platforms. Note that for Databricks, [this option](https://docs.databricks.com/en/delta/history.html)
is only available for tables stored in Delta format.

- **Information Schema**: A system Table that is exposed by the Data Warehouse which contains live information about the Databases
and Tables stored inside the Data Warehouse. It is usually efficient to check, but lacks detailed information about the _type_
of change that was last made to a specific table (e.g. the operation itself - INSERT, UPDATE, DELETE, number of impacted rows, etc)

of change that was last made to a specific table (e.g. the operation itself - INSERT, UPDATE, DELETE, number of impacted rows, etc).
Note that for Databricks, [this option](https://docs.databricks.com/en/delta/table-details.html) is only available for tables stored in Delta format.

- **Last Modified Column**: A Date or Timestamp column that represents the last time that a specific _row_ was touched or updated.
Adding a Last Modified Column to each warehouse Table is a pattern is often used for existing use cases around change management.
If this change source is used, a query will be issued to the Table to search for rows that have been modified within a specific
Expand All @@ -128,8 +130,11 @@ Change Source types vary by the platform, but generally fall into these categori
This relies on Operations being reported to DataHub, either via ingestion or via use of the DataHub APIs (see [Report Operation via API](#reporting-operations-via-api)).
Note if you have not configured an ingestion source through DataHub, then this may be the only option available. By default, any operation type found will be considered a valid change. Use the **Operation Types** dropdown when selecting this option to specify which operation types should be considered valid changes. You may choose from one of DataHub's standard Operation Types, or specify a "Custom" Operation Type by typing in the name of the Operation Type.

Using either of the column value approaches (**Last Modified Column** or **High Watermark Column**) to determine whether a Table has changed can be useful because it can be customized to determine whether specific types of important changes have been made to a given Table.
Because it does not involve system warehouse tables, it is also easily portable across Data Warehouse and Data Lake providers.
- **File Metadata** (Databricks Only): A column that is exposed by Databricks for both Unity Catalog and Hive Metastore based tables
which includes information about the last time that a file for the table was changed. Read more about it [here](https://docs.databricks.com/en/ingestion/file-metadata-column.html).

Using either of the column value approaches (**Last Modified Column** or **High Watermark Column**) to determine whether a Table has changed can be useful because it can be customized to determine whether specific types of changes have been made to a given Table.
And because this type of assertion does not involve system warehouse tables, they are easily portable across Data Warehouse and Data Lake providers.

Freshness Assertions also have an off switch: they can be started or stopped at any time with the click of button.

Expand Down

0 comments on commit caf6ebe

Please sign in to comment.