-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a subset of data to be released #40
Comments
@dpshelio might need some early-ish help with this as we are working on 2 data releases for collaborators. Can you have a think about how much time it would take to fix this and what it would mean for the other work we're asking you to do |
This is just a security check. With that, no item will be ignored. e.g. I didn't putting DOB in identifiablevar by mistake. The program will give you an error. You have to explicitly specify DOB is a non-identifiable var to make the program to run. If you guys think this is not necessary, I can remove the security check. |
In this case we can even remove the non-identifiable var slot in the conf file. |
Thanks. So if I understand rightly before - I had to explicitly state that dob was non-identifiable (and specify all variables as key, sensitive or non-identifying etc |
Correct. |
If we remove the non-identifiable variable slot then we won't have a way of requesting those variables ... i.e. the researcher/requester specifies the variables they want and classifies those variables as direct/key/sensitive/non-identifying. We review this and if happy with the classification we run it and that subset of variables is extracted and anonymised as per the classifiication and k/l configuration. We then hand over the data ... If we remove the non-identifying label we'll have to add back in those variables manually at the end ... |
Sorry, I made a mistake in the previous conversation!!
Yes.
No, dob will be released as it is!!!
Do you think it is necessary to switch the default to "direct identifiable".
In the end, the logic becomes -- if the variable does not appear in key/sens/non-identifiable, it will be removed. It makes "direct var" redundant. |
Are we going to run the conf file directly from the users? There might be a potential security hazard -- one can put a chunk of code in conf file. I do not suggest the users to run their own configuration file unless we make the configuration file safer. |
I think that we should be semantically consistent so Directvars should be identifiers. If we use Directvar to specify a variable that we want removed, does it appear in the release but with missing replacing all the values, or is it just 'dropped'. If possible
Users then
If the user is unhappy with the information loss they then need to alter the data request (drop columns) or negotiate a lower k-anon/l-div spec based on their relationship and local security arrangements. What do you think? |
More intuitive for the users. Not such a big change. It's doable. |
|
at the moment we need to specify all fields
we would prefer just to specify fields that we need
unspecified fields are excluded and then the extract and statistical disclosure control just works with those requested
see https://cchic.slack.com/archives/C2MEV9Y13/p1498490027173478
The text was updated successfully, but these errors were encountered: