Replies: 1 comment
-
Hi @denisw, Presidio itself is deterministic. Usual suspects are the NER model and Azure AI Language. Depending on the type of NER model you're using (spaCy, transformers, stanza), I would suggest to look into fixing the seed for those. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using Presidio together with Azure AI Language and some custom analyzers (including context enhancement), and thought it would be a good idea to create a regression test that takes some known input texts and checks that the anonymization result is the same as in the past.
I noticed, though, that this kind of test is flaky: sometimes, the analysis and anonymization results differ slightly from the previous value for some runs, only to to match in some later run again.
Is it expected that Presidio's recognizers are not fully deterministic? Is there some source of randomness that can perhaps be controlled? Or should I simply not count on the same text resulting in the same analyzer results and anonymized text?
Beta Was this translation helpful? Give feedback.
All reactions