Skip to content

Documentation and utilities for checking de-identification of data uploaded to Flywheel

License

Notifications You must be signed in to change notification settings

brainsciencecenter/flywheel-deidentification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

flywheel-deidentification

Documentation and utilities for checking de-identification of data uploaded to Flywheel.

The profiles here are applied automatically users uploading DICOM data. The site profile is now applied automatically on import with fw ingest dicom, or via the web interface.

Attempting to de-identify the data by any other means will raise an error. Ingest dicom data with the fw ingest dicom command, without --de-identify or any profiles in config file.

Do not use older versions of the code with fw import dicom to import data to Flywheel.

If uploading data manually, or from a new source, you must test and verify the de-identification. First use fw deid test (see the profile page for details). Then upload a single session and check carefully. Notify the admins of any problems.

If you require a different de-identification profile than the site profile, please contact the site admin Gaylord Holder to discuss options.

How the de-identification works

The site profile removes direct identifiers and several indirect identifiers not normally required for research use. Certain indirect identifiers important for research (such as PatientWeight) are retained.

Data received from the scanner connectors is automatically de-identified using this protocol.

Data imported via the web "DICOM Upload" interface also has this profile applied, unless a project-level profile is present. Contact the site admin if you need customized de-identification.

Update 2024-03: There is a known bug affecting Linux browsers, upload only from Mac or Windows. Use the latest Google Chrome. As always, when updating from a new data source or machine, verify results.

Data ingested via the fw ingest dicom command also applies the site profile.

Older versions of the fw program allow the use of fw import dicom, which require a profile on the command line. Do not use fw import dicom to import data to Flywheel.

Limitations of automated de-identification

The site profile removes standard dicom tags that are designed to contain direct identifiers. It also removes some indirect identifiers that might give clues to the patient's identity or other sensitive information like pregnancy status. However, there are some limitations to automated de-identification.

Data uploaded manually, or from an external scanner, should be tested and checked thoroughly.

Potential de-identification failures may arise from:

  • Private DICOM tags. Private tags are NOT modified by the site profile. Data from new sources, whether inside or outside of Penn, should be checked carefully for identifiers in private tags. Flywheel can de-identify private tags but requires extra steps to do so.

  • Identifiers included in text fields such as ImageComments or StudyComments. The PatientComments tag is removed by the profile, others are not because they are often used to store image information or to route data to the correct location in Flywheel. Investigators should ensure that text fields are never used to store identifiers.

  • Burned-in annotations. Clinical data may have patient information present in the pixel data. This needs special handling.

  • Identifiers in non-DICOM imaging data. Identifiers may be present in other file types including NIFTI image data, ZIP archive metadata, and JSON sidecars. Headers, images and metadata from any new source should be checked manually.

  • In some cases, metadata can be saved in a non-standard location. For example, some old data in Flywheel can have a "metadata" field with protected information in it.

Further de-identification for data sharing

The site profiles do not remove all potential indirect identifiers. Information that identifies the study date, scanner, or internal study identifiers can remain present in the header.

Custom secondary de-identification is available through the deid-export gear.

Repository contents

  • Example data containing synthesized PHI, that can be used to test import procedures. This data is derived from a publicly available de-identification test data set. If used in research, please include the citations in the README.

  • A copy of the deidentification profile that is applied on import.

  • Scripts to check flywheel metadata and dicom archives of data already uploaded to Flywheel.

Further reading on de-identification

DICOM standard de-identification profiles. Description of official de-identification profiles.

Free DICOM de-identification tools in clinical research: functioning and safety of patient privacy. Testing some popular DICOM tools, and showing the difficulty in successfully de-identifying DICOM data.

De-identification of Medical Images with Retention of Scientific Research Value. From the Cancer Imaging Archive team. "It is extremely difficult to eradicate all PHI from DICOM images with automated software while at the same time retaining all useful information." A more detailed discussion of their de-identification routines can be found on the Cancer Imaging Archive Wiki.

Report of the Medical Image De-Identification (MIDI) Task Group -- Best Practices and Recommendations. Preprint discussing the complex issues surrounding de-identification for public data sharing.

About

Documentation and utilities for checking de-identification of data uploaded to Flywheel

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages