Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added masking_strategy_override at field level #5446

Merged
merged 42 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
892f807
added masking_strategy_override at field level
Linker44 Nov 1, 2024
cf068a3
added testing
Linker44 Nov 4, 2024
605138f
run static checks
Linker44 Nov 4, 2024
a64592e
add requirements.txt
Linker44 Nov 4, 2024
a704b40
fix broken tests
Linker44 Nov 4, 2024
f9e6544
fix more broken tests
Linker44 Nov 4, 2024
41e76a0
implicit empty configuration on null_rewrite
Linker44 Nov 4, 2024
17732e5
fixed typing errors
Linker44 Nov 5, 2024
148e80a
added testing for db integrations
Linker44 Nov 5, 2024
90735bb
added validation for dataset creation and update
Linker44 Nov 5, 2024
8d40bc4
run static_checks
Linker44 Nov 5, 2024
1fcaa08
improve testing and move validations
Linker44 Nov 6, 2024
488718b
improve router_factory redability
Linker44 Nov 6, 2024
957b485
Merge branch 'main' into field_level_masking
Linker44 Nov 7, 2024
dcfc835
fix: edit fideslang requirement
Linker44 Nov 7, 2024
51fe052
Merge branch 'main' into field_level_masking
Linker44 Nov 7, 2024
7d24f82
change commit in requirements.txt
Linker44 Nov 7, 2024
648cd18
fix using wrong commit hash in requirements.txt
Linker44 Nov 7, 2024
aecd1c2
fixed comments
Linker44 Nov 7, 2024
f509649
Merge branch 'main' into field_level_masking
Linker44 Nov 11, 2024
0a98510
update requirements.txt to new fideslang release
Linker44 Nov 11, 2024
63ea347
corrected variable names, deleted repeated filter_data_category function
Linker44 Nov 12, 2024
f48b1cc
improved readability and added docstring to validate_masking_strategies
Linker44 Nov 12, 2024
57654ad
added testing for masking override put and happy path for update
Linker44 Nov 12, 2024
47711ea
added changelog
Linker44 Nov 12, 2024
294d964
Merge branch 'main' into field_level_masking
Linker44 Nov 12, 2024
dbf10d9
Update tests/ops/service/connectors/test_queryconfig.py
galvana Nov 12, 2024
8e14528
remove failing log
Linker44 Nov 12, 2024
430a494
fixed bigquery test
Linker44 Nov 13, 2024
de55a34
added log warning for when applying field-level override
Linker44 Nov 13, 2024
b88d34f
Merge branch 'main' into field_level_masking
Linker44 Nov 13, 2024
a4d79de
Merge branch 'main' into field_level_masking
galvana Nov 13, 2024
6885d77
Adding verbose logging and enabling traceback
galvana Nov 13, 2024
f61eabe
Adding more logging
galvana Nov 13, 2024
d8c454c
Adding timeout to external tests
galvana Nov 13, 2024
03525e0
Adding timeout to admin DB actions
galvana Nov 14, 2024
5c16eaa
Removing yield from setup_ctl_db fixture
galvana Nov 14, 2024
f91ae25
Merge branch 'main' into field_level_masking
galvana Nov 14, 2024
1d0382c
Updating fixture
galvana Nov 16, 2024
91018d3
Added changelog change
Linker44 Nov 18, 2024
893c4cd
Merge branch 'main' into field_level_masking
Linker44 Nov 18, 2024
b315095
revert nox pytest ctl command change
Linker44 Nov 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The types of changes are:

### Added
- Added namespace support for Snowflake [#5486](https://github.com/ethyca/fides/pull/5486)
- Added support for field-level masking overrides [#5446](https://github.com/ethyca/fides/pull/5446)

### Developer Experience
- Migrated several instances of Chakra's Select component to use Ant's Select component [#5475](https://github.com/ethyca/fides/pull/5475)
Expand Down
21 changes: 14 additions & 7 deletions data/dataset/bigquery_example_test_dataset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ dataset:
collections:
- name: address
fides_meta:
erase_after: [ bigquery_example_test_dataset.employee ]
erase_after: [bigquery_example_test_dataset.employee]
fields:
- name: city
data_categories: [user.contact.address.city]
Expand All @@ -19,12 +19,18 @@ dataset:
data_categories: [user.contact.address.state]
- name: street
data_categories: [user.contact.address.street]
fides_meta:
data_type: string
masking_strategy_override:
strategy: string_rewrite
configuration:
rewrite_value: REDACTED
- name: zip
data_categories: [user.contact.address.postal_code]

- name: customer
fides_meta:
erase_after: [ bigquery_example_test_dataset.address ]
erase_after: [bigquery_example_test_dataset.address]
fields:
- name: address_id
data_categories: [system.operations]
Expand Down Expand Up @@ -238,11 +244,12 @@ dataset:
- name: visit_partitioned
fides_meta:
partitioning:
where_clauses: [
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY) AND `last_visit` <= CURRENT_TIMESTAMP()",
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY) AND `last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY)",
"`last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY)",
]
where_clauses:
[
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY) AND `last_visit` <= CURRENT_TIMESTAMP()",
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY) AND `last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY)",
"`last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY)",
]
fields:
- name: email
data_categories: [user.contact.email]
Expand Down
252 changes: 252 additions & 0 deletions data/dataset/example_field_masking_override_test_dataset.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
dataset:
- fides_key: field_masking_override_test_dataset
name: Field Masking Override Test Dataset
description: Example of a dataset containing masking strategy override at the field-level.
collections:
- name: address
fields:
- name: city
data_categories: [user.contact.address.city]
- name: house
data_categories: [user.contact.address.street]
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: state
data_categories: [user.contact.address.state]
- name: street
data_categories: [user.contact.address.street]
- name: zip
data_categories: [user.contact.address.postal_code]

- name: customer
fields:
- name: address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to
- name: created
data_categories: [system.operations]
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string
length: 40
masking_strategy_override:
strategy: random_string_rewrite
configuration:
length: 5
format_preservation:
suffix: "@example.com"
- name: address
fields:
- name: city
data_categories: [user.contact.address.city]
- name: house
data_categories: [user.contact.address.street]
fides_meta:
data_type: string
masking_strategy_override:
strategy: string_rewrite
configuration:
rewrite_value: "1234"
format_preservation:
suffix: "-test"
- name: state
data_categories: [user.contact.address.state]
masking_strategy_override:
strategy: null_rewrite
- name: street
data_categories: [user.contact.address.street]
- name: zip
data_categories: [user.contact.address.postal_code]

- name: employee
fields:
- name: address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string

- name: login
fields:
- name: customer_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: time
data_categories: [user.sensor]

- name: orders
fields:
- name: customer_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: shipping_address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to

# order_item
- name: order_item
fields:
- name: order_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: orders.id
direction: from
- name: product_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: product.id
direction: to
- name: quantity
data_categories: [system.operations]

- name: payment_card
fields:
- name: billing_address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to
- name: ccn
data_categories: [user.financial.bank_account]
- name: code
data_categories: [user.financial]
- name: customer_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: name
data_categories: [user.financial]
- name: preferred
data_categories: [user]

- name: product
fields:
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: name
data_categories: [system.operations]
- name: price
data_categories: [system.operations]

- name: report
fields:
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: month
data_categories: [system.operations]
- name: name
data_categories: [system.operations]
- name: total_visits
data_categories: [system.operations]
- name: year
data_categories: [system.operations]

- name: service_request
fields:
- name: alt_email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: closed
data_categories: [system.operations]
- name: email
data_categories: [system.operations]
fides_meta:
identity: email
data_type: string
- name: employee_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: employee.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: opened
data_categories: [system.operations]
- name: visit
fields:
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: last_visit
data_categories: [system.operations]
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
dataset:
- fides_key: postgres_example_invalid_masking_strategy_override
name: Postgres Example Invalid Masking Strategy Override Test Dataset
description: Example of a Postgres dataset containing an invalid masking startegy override
collections:
- name: customer
fields:
- name: created
data_categories: [system.operations]
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string
length: 40

- name: employee
fields:
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string
masking_strategy_override:
strategy: hash
configuration:
algorithm: SHA-256
6 changes: 6 additions & 0 deletions data/dataset/postgres_example_test_dataset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,12 @@ dataset:
data_categories: [user.name]
fides_meta:
data_type: string
masking_strategy_override:
strategy: string_rewrite
configuration:
rewrite_value: testing
format_preservation:
suffix: "-test"

- name: login
fields:
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ twilio==7.15.0
typing-extensions==4.12.2
validators==0.20.0
versioneer==0.19
fideslang==3.0.8
fideslang==3.0.9
2 changes: 2 additions & 0 deletions src/fides/api/api/v1/endpoints/dataset_endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
DatasetConfig,
convert_dataset_to_graph,
to_graph_field,
validate_masking_strategy_override,
)
from fides.api.oauth.utils import verify_oauth_client
from fides.api.schemas.api import BulkUpdateFailed
Expand Down Expand Up @@ -417,6 +418,7 @@ def create_or_update_dataset(
# when a ctl_dataset is being linked to a Saas Connector.
_validate_saas_dataset(connection_config, dataset) # type: ignore
# Try to find an existing DatasetConfig matching the given connection & key
validate_masking_strategy_override(dataset)
Copy link
Contributor Author

@Linker44 Linker44 Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset creation and update happens in many places so this validation gets repeated all over. this is one of those cases where a service would come in handy.

dataset_config = create_method(db, data=data)
created_or_updated.append(dataset_config.ctl_dataset)
except (
Expand Down
Loading
Loading