Allow a subset of data to be released #40

docsteveharris · 2017-06-29T11:36:36Z

at the moment we need to specify all fields
we would prefer just to specify fields that we need
unspecified fields are excluded and then the extract and statistical disclosure control just works with those requested

see https://cchic.slack.com/archives/C2MEV9Y13/p1498490027173478

docsteveharris · 2017-06-29T11:37:40Z

@dpshelio might need some early-ish help with this as we are working on 2 data releases for collaborators. Can you have a think about how much time it would take to fix this and what it would mean for the other work we're asking you to do

sinanshi · 2017-06-29T12:32:25Z

This is just a security check. With that, no item will be ignored. e.g. I didn't putting DOB in identifiablevar by mistake. The program will give you an error. You have to explicitly specify DOB is a non-identifiable var to make the program to run. If you guys think this is not necessary, I can remove the security check.

sinanshi · 2017-06-29T12:38:27Z

In this case we can even remove the non-identifiable var slot in the conf file.

docsteveharris · 2017-06-29T13:10:30Z

Thanks. So if I understand rightly

before - I had to explicitly state that dob was non-identifiable (and specify all variables as key, sensitive or non-identifying etc
now - I don't need to explicity state that dob is non-identifiable but if I don't then dob is not released anyway/anywhere

sinanshi · 2017-06-29T17:42:54Z

Correct.

docsteveharris · 2017-06-30T05:41:26Z

If we remove the non-identifiable variable slot then we won't have a way of requesting those variables ...

i.e. the researcher/requester specifies the variables they want and classifies those variables as direct/key/sensitive/non-identifying. We review this and if happy with the classification we run it and that subset of variables is extracted and anonymised as per the classifiication and k/l configuration. We then hand over the data ...

If we remove the non-identifying label we'll have to add back in those variables manually at the end ...

sinanshi · 2017-06-30T09:25:36Z

Sorry, I made a mistake in the previous conversation!!

before - I had to explicitly state that dob was non-identifiable (and specify all variables as key, sensitive or non-identifying etc

Yes.

now - I don't need to explicity state that dob is non-identifiable but if I don't then dob is not released anyway/anywhere

No, dob will be released as it is!!!

Variables are removed only when we explicitly specify it as "direct var".
Variables are modified only when it is specified as key/sensitive var.
Variables will remain untouched (i.e. will be released) if not being specified in direct/key/sensitive var.
the rest will be assumed as non-identifiable as default - i.e. remain in the release.

Do you think it is necessary to switch the default to "direct identifiable".

direct var: remove from the release.
key/sens: modified
non-identifiable: remain.
the rest: being treated as direct var, i.e. removed.

In the end, the logic becomes -- if the variable does not appear in key/sens/non-identifiable, it will be removed. It makes "direct var" redundant.

sinanshi · 2017-06-30T09:31:36Z

Are we going to run the conf file directly from the users? There might be a potential security hazard -- one can put a chunk of code in conf file. I do not suggest the users to run their own configuration file unless we make the configuration file safer.

docsteveharris · 2017-06-30T10:14:39Z

I think that we should be semantically consistent so Directvars should be identifiers. If we use Directvar to specify a variable that we want removed, does it appear in the release but with missing replacing all the values, or is it just 'dropped'.

If possible

direct var: remove from the release (because it is an identifier, could remain as a column of missingness)
key/sens: modified (remains)
non-identifiable: remain. / no change but won't be there unless explicitly requested
the rest (anything not specifically requested) removed, no column heading, no data at all

Users then

list the columns they want
we provide the classification using pre-agreed definitions
we recommend/provide a k-anon/l-div threshold based on their relationship with us
we provide data with a measure of information loss

If the user is unhappy with the information loss they then need to alter the data request (drop columns) or negotiate a lower k-anon/l-div spec based on their relationship and local security arrangements.

What do you think?
Is this a big change?

sinanshi · 2017-06-30T21:50:59Z

More intuitive for the users. Not such a big change. It's doable.

sinanshi · 2017-07-18T15:30:55Z

probably means that i should also have removed the direct identify fields and then we would have a master file that double checked that we weren't being asked for any of these but that might be a later piece of work

docsteveharris assigned dpshelio and sinanshi Jun 29, 2017

sinanshi added a commit that referenced this issue Jun 29, 2017

remove security check #40

7763d58

sinanshi mentioned this issue Jul 24, 2017

Better config #41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow a subset of data to be released #40

Allow a subset of data to be released #40

docsteveharris commented Jun 29, 2017

docsteveharris commented Jun 29, 2017

sinanshi commented Jun 29, 2017

sinanshi commented Jun 29, 2017

docsteveharris commented Jun 29, 2017

sinanshi commented Jun 29, 2017

docsteveharris commented Jun 30, 2017

sinanshi commented Jun 30, 2017 •

edited

Loading

sinanshi commented Jun 30, 2017

docsteveharris commented Jun 30, 2017

sinanshi commented Jun 30, 2017

sinanshi commented Jul 18, 2017

Allow a subset of data to be released #40

Allow a subset of data to be released #40

Comments

docsteveharris commented Jun 29, 2017

docsteveharris commented Jun 29, 2017

sinanshi commented Jun 29, 2017

sinanshi commented Jun 29, 2017

docsteveharris commented Jun 29, 2017

sinanshi commented Jun 29, 2017

docsteveharris commented Jun 30, 2017

sinanshi commented Jun 30, 2017 • edited Loading

sinanshi commented Jun 30, 2017

docsteveharris commented Jun 30, 2017

sinanshi commented Jun 30, 2017

sinanshi commented Jul 18, 2017

sinanshi commented Jun 30, 2017 •

edited

Loading