-
Notifications
You must be signed in to change notification settings - Fork 0
/
data-quality-metrics.Rmd
104 lines (71 loc) · 4.12 KB
/
data-quality-metrics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: "Data quality metrics"
date: "Last updated: `r format(Sys.time(), '%B %d, %Y')`"
bibliography: bibliography.bib
csl: american-medical-association.csl
link-citations: yes
output:
html_document:
toc: TRUE
toc_depth: 2
toc_float:
collapsed: FALSE
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, child = "js/back-to-top.js"}
```
```{css, echo = FALSE}
.warning {
margin-bottom: 20px;
}
```
<br>
Listed below are the data quality metrics that Gannet computes during data processing, signal fitting, and tissue segmentation.
## Linewidth
Linewidth is calculated as the full-width half-maximum (FWHM) (in Hz) of fitted model signals. When reporting linewidths of datasets, it is recommended to use the FWHM of Cr or NAA from the OFF spectrum, or the FWHM of the water reference (if a water reference is provided).
## Signal-to-noise ratio (SNR)
The SNR of fitted signals is calculated as the amplitude of the given signal model divided by twice the standard deviation of the noise signal. To estimate the noise signal, Gannet takes two independent segments of the OFF or DIFF spectrum (as appropriate to the modeled signal of interest) between 8–9 ppm and 9–10 ppm, and detrends them using a second-order polynomial function. Detrending is performed to remove baseline artifacts (often related to the residual water signal). The standard deviation of each detrended noise segment is then calculated. The smaller of the two standard deviations is then used as the estimate of noise, which is then multiplied by 2.
Formulaically, this is defined as:
$$
SNR_{signal} = \frac{A_{signal}}{2\cdot\mathrm{std}(noise)}
$$
where:
| <u>Parameter</u> | <u>Definition</u> |
| :- | :--------- |
| $A_{signal}$ | Signal model amplitude |
| $noise$ | Detrended noise signal between either 8–9 or 9–10 ppm (whichever produces a lower standard deviation) in the appropriate spectrum (i.e., either the OFF or DIFF spectrum) |
## Frequency offsets (frequency drift/motion)
To estimate the degree of frequency offsets that result from scanner-related frequency drift [@Hui2021a] and participant motion [@Evans2013], Gannet calculates an average frequency offset [@Mikkelsen2017] $\overline{\Delta\delta_{0}}$. This is calculated as the mean (over the course of the acquisition) difference between the observed frequency of the residual water signal in the pre-frequency-corrected subspectra and the nominal water frequency $\delta_{0}$ at 4.68 ppm (4.8 ppm for room-temperature phantoms), or the nominal Cr frequency at 3.02 ppm for HERMES acquisitions. It should be noted that using the mean of offset differences does not fully characterize frequency offsets but is a useful heuristic.
Formulaically, this is defined as:
$$
\overline{\Delta\delta_{0}} = \frac{1}{m}\sum{\widehat{\delta_{0,m}} - \delta_{0}}
$$
where:
| <u>Parameter</u> | <u>Definition</u> |
| :- | :--------- |
| $m$ | Each individual subspectrum index number |
| $\widehat{\delta_{0,m}}$ | Observed water or Cr frequency in each individual subspectrum |
| $\delta_{0}$ | Nominal water or Cr frequency |
## Fit error
When fitting signal functions to metabolite peaks, Gannet will also estimate the error of the fit. This is defined as the standard deviation of the residuals of the signal model fit divided by the signal model amplitude and multiplied by 100 to give a percentage.
For metabolite peak fits, this is:
$$
\epsilon_{metab} = 100\cdot\frac{\mathrm{std}(resid_{metab})}{A_{metab}}
$$
where:
| <u>Parameter</u> | <u>Definition</u> |
| :- | :--------- |
| $resid_{metab}$ | Signal model fit residuals |
| $A_{metab}$ | Signal model amplitude |
Similarly, for reference signal fits, this is:
$$
\epsilon_{ref} = 100\cdot\frac{\mathrm{std}(resid_{ref})}{A_{ref}}
$$
Since all metabolites are normalized to a reference signal (either Cr or unsuppressed water) and reported as such, the fit error that really should be considered (and reported) is the combined error of the metabolite and reference signal model fits, which add up in quadrature.
Formulaically, this is defined as:
$$
\epsilon_{metab,ref} = \sqrt{\epsilon_{metab}^2 + \epsilon_{ref}^2}
$$
### References