Skip to content

Best Practices for Standardized Quality Control (QC) and Quality Assurance (QA)

Gabriel A. Devenyi edited this page Nov 11, 2024 · 3 revisions

Quality Control and Quality is a an essential component of all types of scientific research. The outcomes of specific statistical hypothesis testing relies on both the overall quality of the data collected, and detection and removal of outliers, which can unduly impact typical parametric tests.

This document is intended as a guide to the general process of "how to do QC/QA", and while it provides examples, will not be a comprehensive guide to evaluating the correctness of any specific data, or steps within a pipeline. Links will be provided wherever possible to guides which do provide extensive examples and guidance for specific data.

General principles

The ultimate goal of Quality Control is to determine a true/false signal for the inclusion/exclusion of a specific piece of data within a given analysis. Such a decision is ultimately binary, however some data is more or less valuable in terms of rarity, and in some cases, "perfect" data may simply be unavailable, hence "grading" data on an ordinal scale from "terrible" to "perfect" gives one more flexibility in making decisions about inclusion/exclusion. The exact number of steps is ultimately determined by one's ability to differentiate grades of quality/error, which depends on both the type of data being evaluated, and the individual.

Consistency and Reproducibility

Scales

Quality Control can be thought of as a measurement process of its own, and hence thought of in terms of intra- rand inter-rater reliability. One should choose a grading scheme for quality assessments which under blinded conditions can be reproduced in intra-rater tests, and (after training and practice) be reproducible by others, as well as consistent across time. Thus, we have a tension between ease of rating (favouring binary pass/fail scales) and capture of variability (favouring many individual grading values)

Presentation

In order to maximize reproducibility of ratings, variation of evaluation methods should be minimized. Views of the data should be presented with consistent orientation, intensity, contrast and field-of-view. Data variation not directly indicative of quality markers under evaluation should be standardized. This means that the best condition to evaluate some types of "raw" data may be after some preprocessing has been applied.

Time

Sufficient time should be spent on QC to ensure the goals of Consistency and Reproducibility, while balancing the demands of productivity. Optimizing a highly refined and detailed QC process is well and good for a 30-subject study, but consider the implications of such a process applied to a 10,000 subject study. Translatable methods at any scale allow for comparable best practices across many studies. Do not let the perfect be the enemy of the good.

"Quick" QC

A "Quick" QC should be performed immediately after all data collection for the purposes of monitoring processes and detecting failures of machinery and procedures. This quick check takes on an important additional dimension in the case of human data, where researchers are ethically expected to report incidental findings for medical follow up. This quick QC is not a substitute for the comprehensive reproducible process.

Blinding

Subject naming should be such that QC is done under blind conditions, such that the raters do not know if they are rating control or test subjects.

General Processes

Data, regardless of its QC status, should be included in downstream processing whenever possible, except when that processing mixes images together, like modelbuild/DBM. We do this because some processing pipelines are robust to certain types of QC-failing conditions (like MAGeT-brain appears to be insensitive to motion artifacts).

QC should be performed on static images, sliced through 3D volumes across multiple axes, covering the full field-of-view of the region to be evaluated, along with a suffcient buffer to provide suitable context. In the case of raw-QC, the field-of-view should include the image background, and image intensity levels should be set to ensure background noise levels are enhanced, as some raw-QC quality features have signatures in the background noise.

Wherever possible, the static QC images generated by the specific pipelines under use should be used for QC evaluation, as they have been optimized to show the features best suited to evaluate QC.

Viewing images, and logging results

An image viewer which can both display images and log a score is strongly recommended, to avoid window switching between multiple windows. The recommended tool for this PyQC, developed in-lab, in the future a web-based QC tool may be implemented to share QC workload (and allow multiple rater comparison).

Using PyQC

PyQC is available as a module (module load PyQC). After loading, it is run from the command line and pointed at either a list of image files (jpg, png, gif, webp, jpeg) or a directory, from which all files of those types will be loaded. A window opens, with a spreadsheet on the left side, and the first image fit-to-the-screen on the right. The window should be maximized to make the image under view as large as possible. The list of images can be navigated by clicking a row in the table, pressing w or / for up and s or * for down in the list. Pressing any number on the keyboard will log the score in the current cell, and advance to the next cell. Each row contains two rating cells, originally intended for rating image quality and processing scores for human preprocessing. If doing a single rating, simply press the same key twice. The numpad on the right of the keyboard is ideally suited for this process. One can also zoom in/out with + and -, but consider reproducibility and time considerations if you are considering this level of evaluation.

QC of Raw Human Structural MRI

The specific quality features of human structural MRI are best assessed after bias fields have been corrected and intensity ranges adjusted, as low background levels can hide ghosting and other motion artifacts. The output of the iterativeN3 pipeline provides several QC images for evaluation of its outputs which are also suitable for evaluating the raw image quality.

QC of Human Preprocessing

QC of Raw Animal Structural MRI

QC of Animal Preprocessing

QC of modelbuild results

QC of images not otherwise specified

Clone this wiki locally