Solution Accelerator for handling sensitive data on the Databricks Data Intelligence Platform

For a detailed conceptual understanding of handling sensitive data on the Databricks Data Intelligence Platform, please refer to the following blogpost

Path to Data Protection and Compliance with Databricks Data Intelligence Platform

Requirements:

Unity Catalog enabled Databricks Workspaces
User with Workspace Admin Privilege
Unity Catalog enabled cluster with DBR 13.3 or above

How to run

Clone the repository into Databricks Workspace Repo
Create UC Catalogs and Schemas (examples below)

DIZ Catalog: example
DIZ Schema: default
Dev Catalog: example_dev
Dev Schema: default
Dev Catalog: example_prod
Dev Schema: default

Create privileged users account group (example: prod-privileged-users) and add the users. This is used for fine grained access controls.

Run the notebooks

Option 1: Run the notebooks manually in sequence.

Run all the notebooks (except notebook 4. Prod CLM enforcement.py) using a Cluster in Assigned mode (Single User). DBR 13.3 or above

Option 2: Create a workflow and create a Task for each Notebook

Set the following job parameters

Name	Value
diz_catalog	example
diz_schema	default
num_rows	1000
prod_catalog	example_prod
dev_catalog	example_dev
target_schema	default
free_text	freetext
privileged_group_name	prod-privileged-users

Option 3: Create a workflow with the JSON definition using the Jobs API create endpoint or Databricks CLI. Please note that recreating this job requires you to update the highlighted identifiers with the right values.

 3a. Copy the contents of the [JSON definition](workflow/create_databricks_job.json) file and change the string "[email protected]" with your Databricks username.

 3b. Create a job (via Databricks Cli or Databricks REST API) using the json (Example: `databricks jobs create --json '<json content>'`)

Observe the results: Browse the bronze and silver tables in the specified catalog/schema.

Disclaimer:

The views/opinions expressed here are our own and do not necessarily represent the views/opinions of Databricks.
The sample code provided is intended to aid in getting started and may not be production-ready. The code does not have any guarantees/warantees/support. Use it at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
imgs		imgs
notebooks		notebooks
workflow		workflow
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solution Accelerator for handling sensitive data on the Databricks Data Intelligence Platform

For a detailed conceptual understanding of handling sensitive data on the Databricks Data Intelligence Platform, please refer to the following blogpost

Requirements:

Contents

How to run

Disclaimer:

About

Releases

Packages

Contributors 2

Languages

Karthikeya108/gdpr-databricks-dip

Folders and files

Latest commit

History

Repository files navigation

Solution Accelerator for handling sensitive data on the Databricks Data Intelligence Platform

For a detailed conceptual understanding of handling sensitive data on the Databricks Data Intelligence Platform, please refer to the following blogpost

Requirements:

Contents

How to run

Disclaimer:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages