Pseudonymization operator - initial discussion #1118
omri374
started this conversation in
Design records
Replies: 1 comment
-
FYI, we now have a sample for pseudonymization: https://github.com/microsoft/presidio/blob/main/docs/samples/python/pseudonomyzation.ipynb |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
DRAFT
Context / Problem Statement
Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis. (source: wikipedia)
In the context of Presidio, a pseudonymization operator would replace entity values with synthetic values, while maintaining a 1:1 mapping between the original value and the synthetic value.
There are some decision factors to be addressed:
Considered options
1. Maintain an entity-mapping object within Presidio Anonymizer + pass a lambda for logic on how to generate a new value
2. Have the user pass an entity-mapping dictionary and get an updated entity-mapping as part of the response, while the generation of a new mapping between original and synthetic is the responsibility of Presidio Anonymizer
3. Have the user pass an entity-mapping dictionary which is meant to already be updated, so the responsibility of generating a new mapping is the client's
Replace
operator and maintain the entire mapping logic on the client side.- No change needed
- Full flexibility for the client side
Other options?
If you are working on a pseudonymization use case, please share your feedback on this.
@feynmanliang
@lordlinus
@SharonHart
Beta Was this translation helpful? Give feedback.
All reactions