Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

June 2024 notes Keynote and Session 1 #61

Merged
merged 2 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,17 +1,12 @@
# Facilitating researcher access to data with PySyft
# Breakout: Facilitating researcher access to data with PySyft

**Leads**: Dave Buckley (OpenMined)

## Proposal
## Notes

### Summary
Dave provided a quick overview of OpenMined as an organisation and their flagship open source product, PySyft

[OpenMined](https://openmined.org/) will present their [PySyft notebook](https://github.com/OpenMined/PySyft) which allows to "Perform data science on data that remains in someone else's server"
[OpenMined](https://openmined.org)
[PySyft](https://github.com/OpenMined/pysyft)

### Preparation

No required preparation beyond an open mind!

### Target audience

No specific target audience in mind - anybody interested!
The aim is to facilitate "remote data science" cf. OpenSAFELY, by providing a framework for such services
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
# Do we need an IG working group?

Breakout discussion
# Breakout: Do we need an IG working group?

**Leads**: Amy Tilbrook (UoE)

## Proposal

Breakout discussions are open talks around a topic. They are open ended, and while we hope actions and collaboration arise from them, there is no specific output expected by the end of the session.

### Prompts
## Prompts

- Is there a perception of lack of IG knowledge in UK TRE community ("an IG black hole"?)
- What support would benefit the community?
Expand All @@ -21,21 +15,65 @@ A) Specific IG working group? Thoughts on:

B) Is IG an element in ALL working groups? (as per previous group days: "working groups need a purpose, as creating and maintaining one takes effort, many people are interested in everything/all groups"). How could this be supported?

C) Something else?
C) Something else? Previous suggestions for remit of an IG working group:

- Previous suggestions for remit of an IG working group:
- "Something around Information Governance and policies" e.g. ISO27001 and also local e.g. University policies
- Advice from contributors for specific issues -
- Advice from contributors for specific issues
- Aligning strategies for dealing with TRE-specific IG challenges - e.g. AI/ML, commercial access, international access

### Summary

Open discussion on the need of an Information Governance working group, its potential remit and possible members.

### Preparation

No required preparation beyond an open mind!

### Target audience

No specific target audience in mind - anybody interested!
## Notes

- What do we mean by IG? Each org uses it in slightly different terms so we don't have a unified conception of what is meant?
- Grampian "it covers the whole process from the minute a researcher contacts us, to when the project data is deleted"
- Technical stuff is seen to be easy, IG stuff is hard - would like to know how we are supposed to interact with all the different projects
- Difference between umbrella TRE IG and project related IG
- IG = context and organisation dependent - might depend on risk appetite or data held.
- Looking to adapt existing processes to adapt to a TRE way of working (data access rather than data sharing)
- Definite appetite for support in this area.
- Goes beyond/outside of ethics/regulatory compliance.
- Is there an interest in standardisation? With such context/data related specifics is this even possible?

- For organisations who have been doing this a long time, the differences between controllers/processors and roles for TREs are fairly established.
- Scotland is working towards a federated approach to governance across Scottish TREs to facilitate/streamline data sharing.
- Often issues arise in research governance rather than IG or data protection teams - we can't change the rules/regulations around health research governance.
- Why does it take 2 years to get access to data?
- Does IG encompass data sharing (contracting?)
- legal base issues within NHS SDEs

- IG Challenges:

- Consented vs unconsented projects: How does consent unlock data to be used by a project?
- What advice and guidance can be given when talking to new organisations/other contexts?
- Consented vs. unconsented studies
- Research governance a barrier
- Anonymous vs. identifiable data
- How would the 'IG' world support the consents of a person for the data held by a data controller org to allow their data to be accessed by the project they have consented to.
- How would the IG world react to technology options to extend control across federated facilities such that it eases the ability to speed up access to the data but maintaining as much control and governance oversight as possible.
- What are the rules and how can we articulate them across the board?

- Ideas:
- A group that could give advice on how to negotiate the relationships?
- Work towards TRE specific research governance to support research governance teams - what?
- Playbook for open IG (need to be fairly high level) - e.g. survey of high level workflows of IG processes within 5 safes. How to determine safe people, safe settings etc?
- Have a look at these pages - transparency standards: https://www.abdn.ac.uk/research/digital-research/accessing-data-1688.php and https://www.abdn.ac.uk/research/digital-research/obtaining-permissions-1703.php
- A community where questions could be posed/answered, discussion forum on IG set up, definition.

## Summary

- Technical is easy, IG is hard - because:
- What is IG? What does it encompass (difference between IG and research governance?)
- IG is context, data, organisation dependent (risk appetites)
- Anything created for general use would need to be fairly high level
- Appetite
- to understand what IG set ups there are across the TRE community
- to develop something to support TRE specific research governance/open IG
- to have a forum where IG questions could be asked (both umbrella IG for TREs in general - especially for newer TREs, and project specific IG issues - e.g. consented studies)
- to understand what is specific TRE IG (data access) rather than data sharing IG

Suggested first things for an IG group to do:

- Find drivers/champions/leads
- See Grampian examples of workflows
- Set up survey of
- What does IG mean in your context?
- What workflows/governance processes can be shared?
Original file line number Diff line number Diff line change
@@ -1,19 +1,45 @@
# New research: language to use when explaining SDEs and TREs to the public
# Breakout: New research: language to use when explaining SDEs and TREs to the public

**Leads**: Emma Morgan (Understanding Patient Data)

## Proposal

### Summary

[Understanding Patient Data](https://understandingpatientdata.org.uk/)(UPD) has recently published their [final report](https://understandingpatientdata.org.uk/what-words-use) on the What Words To Use project with Research Works, which focused on exploring the best language to use when explaining Secure Data Environments and Trusted Research Environments to the public.

During the event UPD will make a 20 minutes presentation on the project and its results, followed by an open discussion with the community.

### Preparation

No required preparation beyond an open mind!

### Target audience

No specific target audience in mind - anybody interested!
[Understanding Patient Data](https://understandingpatientdata.org.uk/)(UPD) has recently published their [final report](https://understandingpatientdata.org.uk/what-words-use) on the _What Words To Use_ project with Research Works, which focused on exploring the best language to use when explaining Secure Data Environments and Trusted Research Environments to the public.

## Notes

- Emma took us through UPD project: how to explain TREs and related terms to the public, and generate some explainer materials.
- Part 1: Rapid evidence review
- Patients supportive of direction to data access through TREs
- limited evidence on specific aspects of TREs
- Commercial use of data sometimes controversial
- Comms around TREs: explaining TREs is hard
- Lack of consistency in terms used (SDE, TRE etc,). Variety of names confusing and needs to be resolved
- 5 Safes useful as conceptual basis
- Benefits of data use key
- Don't assume prior knowledge
- Part 2: workshops
- 7? workshops, 6 participants each, tried to provide a good demographic mix (age, ethnicity, gender, digital exclusion)
- People care about: Is the data identifiable? Who has access? Reassurance that the data is safe. What the data is being used for, and for what purpose/benefit.
- Some consensus in preferences over the use of certain terms/language.
- Part 3: Explainer materials/draft resource: different tiered 'levels' of information for different levels of interest
- 2x workshops
- Interviews with domain expertise to fact-check
- 1st level: Concise description of TRE/SDEs
- 2nd level: Animation being prepared w/story board and voiceover
- 3rd level: more detailed info on specific terms (e.g. 5 Safes)

## Discussion

- How might you use this information/resource?
- Honest broker service in NI, will flag this report with team who are leading on some work on public transparency (funding from UKRI). Liked the way the materials are adaptable for own use
- Works in HDR Global, lower and middle income countries, lots of interest in TREs there. Work could be useful across these different regions, approach could be taken and tested across different regions.
- RDS released TRE explainer, will tweak to reflect some of the findings from this work (over use of term 'de-identified').
- Concerns about methodology, findings or resources that would limit you adopting them?
- What do you think about the balance between transparency and accessibility?
- What other topics related to TREs would benefit from PPIE?

## Summary

Presentation then discussion with positive feedback.
There will be an animation that can be voiced over by different TREs with their specifics, accents...

Concerns about resources: trying to make something for everyone but there will always be gaps
30 changes: 21 additions & 9 deletions docs/events/wg_workshops/2024-06-05-june-meeting/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
:maxdepth: 1
:hidden: true

keynote-crick-tre
workshop-data-processing-tools
discussion-information-governance
discussion-what-words-to-use
Expand Down Expand Up @@ -44,16 +45,27 @@ discussion-working-groups

### Keynote

A talk by Pete Barnsley about the Crick's Institute TRE, its current approach and the history of how it came to be.
A [talk by Pete Barnsley about the Crick's Institute TRE](keynote-crick-tre), its current approach and the history of how it came to be.

### Community updates

_A chance for anyone in the community to share quick updates with everyone on the call._

The Community Management Working Group will share progress and general updates and next steps.
Working groups leads will then update the community on their progress and present their charters
#### Working group charters

This update will mark the start of the official community review period, after which working groups will be formally approved.
This is the start of the official community review period, after which working groups will be formally approved.

- Glossary https://www.uktre.org/en/latest/structure/tre-glossary.html
- Cybersecurity Risk https://www.uktre.org/en/latest/structure/cybersecurity-risk.html
- SATRE https://www.uktre.org/en/latest/structure/satre.html
- Funding and sustainability https://docs.google.com/document/d/1RMEbzt4SIeXqiYjHI-OVuDsPRJCEGThNdsnmKVmCxWE/edit
- SDE/TRE terminology https://www.uktre.org/en/latest/structure/sde-tre-terminology.html
- Extending control https://www.uktre.org/en/latest/structure/extending-control.html
- Citizen Agency https://www.uktre.org/en/latest/structure/citizen-agency.html

#### UK TRE SDE / TRE Definitions survey

SDE/TRE new survey available: https://forms.office.com/e/VaB9202cpB

### Breakout sessions

Expand All @@ -63,11 +75,11 @@ There will be two sessions on the day of 45 minutes each.

#### Session 1

- [](./workshop-data-processing-tools.md) - workshop
- [](./discussion-information-governance.md) - discussion
- [](./discussion-what-words-to-use.md) - discussion
- [](./workshop-researcher-registry.md) - workshop
- [](./discussion-data-access-pysyft.md) - discussion
- [](./workshop-data-processing-tools.md) - Workshop
- [](./discussion-information-governance.md) - Discussion
- [](./discussion-what-words-to-use.md) - Discussion
- [](./workshop-researcher-registry.md) - Workshop
- [](./discussion-data-access-pysyft.md) - Discussion

#### Session 2

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Keynote: # Crick's TRE: history and approach

<iframe width="560" height="315" src="https://www.youtube.com/embed/1FqVEP0OVlY?si=9OoPOnnTe90sAvv6" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

## Q&A

- How cloud agnostic is it?
- Azure, AWS and Azure not beyond that (limited to what Snowflake supports)
- How much of what we saw is in operation? Everything, working for a year and a half
- focus of the process on legal and agreement aspects via forms
- Can you explain the metadata side of the platform, EG what software is being used?
- Snowflake offers standard data model for all objects in all accounts
- Projects are created in Snowflake sub-accounts
- Data brought out of Snowflake, collated, and replicated as one outside of Snowflake
- You have a lot of integration across providers, IBM, microsoft and AWS. How have you overcome interoperability challenges?
- lot's of these integrations are at a data level, which is easier than a functionality level for those tools
- Do you ever act as data processors on behalf of data providers or is this effectively a data hosting service (where the data comes processed by the data providers/controllers)?
- Both, if you are a data processor on your own right you could bring your own "bedroom" and retain control of it
- Any idea of cost
- https://www.snowflake.com/en/data-cloud/pricing-options/
- How are you managing audit and compliance across the three cloud platforms?
- Snowflake does it! Objects are created by snowflake on the three clouds, ensuring compliance
- Am I right in thinking you only host consented data?
- How do users see what they are spending in the platform, or how many of their credits they have used?
- Automatic threshold detector for every processing compute, message sent depending on threshold (e.g 75%)
- Who are the Roles (Loader, Processor, etc.) assigned to: study team members, central services, IG specialists?
- They are assigned by the accountable owners (by authorising emails or actions on servicenow) of each collaborating partner or the board for the collaboration account. So a human can have many roles if approved by their organisation or the project.
- Can you provide a quick overview of Snowflake and the core features it provides
- Their website is well documented: I would try these links....
- https://docs.snowflake.com/en/user-guide/organizations
- https://docs.snowflake.com/en/user-guide/admin-account-identifier
- https://docs.snowflake.com/en/guides-overview-sharing#label-about-direct-share
- https://other-docs.snowflake.com/en/collaboration/provider-listings-auto-fulfillment
- https://other-docs.snowflake.com/en/collaboration/collaboration-listings-about
- https://docs.snowflake.com/en/sql-reference/commands-user-role
Original file line number Diff line number Diff line change
@@ -1,26 +1,43 @@
# Data processing tools

Workshop
# Workshop: Data processing tools

**Leads**: James Friel (University of Dundee), Aida Sanchez (UCL)

## Proposal
Discussion around data processing, de-identification, and cohort building.

## Required preparation

A general understanding of data anonymisation.
The ICO anonymisation guidance & the ADF (anonymisation decision making framework) may be of interest as a grounding in this.

## Target audience

People who work in data de-identification and data providers for TREs

### Prompts
## Prompts

- Risk appetite to deposit data in a TRE - What level on de-identification is comfortable for use within a TRE? e.g truncation, pseudo-anonymization
- Risk appetite to deposit data in a TRE - What level of de-identification is comfortable for use within a TRE? e.g truncation, pseudo-anonymization
- What do current data processing pipelines look like? And are their pain points in the process?
- What De-identification tools are being used? What has worked? What hasn't?
- What de-identification tools are being used? What has worked? What hasn't?

### Summary
## Notes

Intro discussion around data processing, de-identification , and cohort building.
CPRD Clinical Practice Research Datalink

### Preparation
- https://www.cprd.com/cprd-tre-features-guide-users

A general understanding of data anonymisation.
The ICO anonymisation guidance & the ADF (anonymisation decision making framework) may be of interest as a grounding in this.
Canon: have non-opensource tools (DICOM, FHIR, CSV, Free Text, 'omics, Pathology)

- Only available via agreement with https://research.eu.medical.canon/

NetCDF, ArcGIS Enterprise, 100+TB data, SPARK to process data

- Provide data to federated TREs

Plans for using OpenShift. Possible batch schedulers:

- https://www.coreweave.com/blog/sunk-slurm-on-kubernetes-implementations
- https://kueue.sigs.k8s.io/

### Target audience
## Summary

People who work in data de-identification and data providers for TRES
General discussion of approaches and tools used
Loading
Loading