You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Filing this question on behalf of a user from a private thread.
The auto-generated plan sees that a column is PII, but it is SSN without the dashes …. With the sdtype = ssn still work given no dashes or is the a custom generator on our side to be developed?
The text was updated successfully, but these errors were encountered:
Assuming that this is the metadata you have for your ssn column:
"my_ssn_column": {
"sdtype": "ssn",
"pii": true
}
Then by default, SDV synthesizers will generate random SSN values that contain dashes, for eg. 236-57-5670. This is happening because SDV uses the Faker library for PII anonymization -- and Faker is only capable of producing SSNs without dashes (see the Faker documentation).
The fix: Override the column
Luckily you can override the anonymization method. Instead of using Faker's SSN generator, you supply a generic generator that combines 9 random digits. To do this, create a generic transformer and apply it using the update_transformers function.
fromrdt.transformers.piiimportAnonymizedFaker# a generic generator that creates combinations of 9 digitsmy_ssn_transformer=AnonymizedFaker(
provider_name=None,
function_name='bothify',
function_kwargs={'text': '#########'}
)
# apply this generator to the ssn column of your synthesizersynthesizer=GaussianCopulaSynthesizer(metadata)
synthesizer.auto_assign_transformers(data)
synthesizer.update_transformers({
'my_ssn_column': my_ssn_transformer
})
synthesizer.fit(data)
Now the synthetic data will contain SSNs values without dashes: 375766167
Filing this question on behalf of a user from a private thread.
The auto-generated plan sees that a column is PII, but it is SSN without the dashes …. With the sdtype = ssn still work given no dashes or is the a custom generator on our side to be developed?
The text was updated successfully, but these errors were encountered: