Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new peak quality metrics to findChromPeaks for CentWave #685

Merged
merged 27 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
53404ff
Adding a new peak shape and SNR estimate using a skewed beta distribu…
wkumler Jul 18, 2023
64182e5
Adding new beta option to CentWaveParam and downstream objects
wkumler Jul 18, 2023
548dce4
Adding beta parameters to peak df output
wkumler Jul 18, 2023
1919a11
Adding unit tests for new beta columns
wkumler Jul 31, 2023
26c97bf
Adding verboseBetaColumns to predIsoROIs
wkumler Jul 31, 2023
8062c56
Adding documentation to code, needs compilation
wkumler Jul 31, 2023
2ae041f
Compiling documentation
wkumler Jul 31, 2023
852efd1
Adding extendLengthMSW and verboseBetaColumns to PredIsoParam
wkumler Jul 31, 2023
08009b5
Adding dbeta from stats to NAMESPACE per R CMD check recommendation
wkumler Jul 31, 2023
6dd26b7
Actually adding verboseBetaColumns to documentation
wkumler Aug 1, 2023
3767cc3
Updating faahko_sub to use new verboseBetaColumns
wkumler Aug 1, 2023
5c79e68
Adding slot names to param docs
wkumler Aug 1, 2023
1b2846e
Fixing faahko_sub object with new filenames?
wkumler Aug 1, 2023
3f3b1dd
Fixing faahko_sub object with new filenames?
wkumler Aug 1, 2023
25da195
Merge branch 'devel' of https://github.com/wkumler/xcms into devel
wkumler Aug 1, 2023
9580e9d
Fixing faahko_sub object with new filenames?
wkumler Aug 1, 2023
ad30aa7
Adding link to Github PR and BMC manuscript
wkumler Nov 16, 2023
db246b8
Moving function definiton outside of loop
wkumler Nov 16, 2023
ad89d6e
Adding link to BMC manuscript
wkumler Nov 16, 2023
5d56f67
Converting beta calculations to standalone function
wkumler Nov 16, 2023
1b5fa3e
Adding unit tests for beta calculations and debugging
wkumler Nov 16, 2023
d734b49
Updating documentation
wkumler Nov 16, 2023
db92755
Adding beta values to filled peaks
wkumler Nov 18, 2023
42e0ab7
Implementing beta fitting during peak filling
wkumler Nov 21, 2023
a5ec78a
Switching to rtime- and NA-handling .get_beta_values function
wkumler Nov 24, 2023
44cd066
Tidying code now that .get_beta_values returns vec not list
wkumler Nov 24, 2023
6eb7e5b
Bugfixing, adding documentation and updating unit tests
wkumler Nov 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ importFrom("stats", "aov", "approx", "convolve", "cor", "deriv3",
"dist", "fft", "fitted", "lm", "loess", "lsfit", "median",
"na.omit", "nextn", "nls", "predict", "pt", "quantile",
"runmed", "sd", "stepfun", "weighted.mean", "density", "approxfun",
"rnorm", "runif")
"rnorm", "runif", "dbeta")
importFrom("utils", "flush.console", "head", "object.size",
"packageVersion", "read.csv", "tail", "write.csv",
"write.table")
Expand Down
18 changes: 14 additions & 4 deletions R/DataClasses.R
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,14 @@ setClass("XProcessHistory",
#' method to extend the EIC to a integer base-2 length prior to being passed to
#' \code{convolve} rather than the default "reflect" method. See
#' https://github.com/sneumann/xcms/issues/445 for more information.
#'
#' @param verboseBetaColumns Option to calculate two additional metrics of peak
#' quality via comparison to an idealized bell curve. Adds \code{beta_cor} and
#' \code{beta_snr} to the \code{chromPeaks} output, corresponding to a Pearson
#' correlation coefficient to a bell curve with several degrees of skew as well
#' as an estimate of signal-to-noise using the residuals from the best-fitting
#' bell curve. See https://github.com/sneumann/xcms/pull/685 and
#' https://doi.org/10.1186/s12859-023-05533-4 for more information.
#'
#' @details
#'
Expand Down Expand Up @@ -491,7 +499,7 @@ NULL
#' for a chromatographic peak detection using the centWave method. Instances
#' should be created with the \code{CentWaveParam} constructor.
#'
#' @slot ppm,peakwidth,snthresh,prefilter,mzCenterFun,integrate,mzdiff,fitgauss,noise,verboseColumns,roiList,firstBaselineCheck,roiScales,extendLengthMSW See corresponding parameter above. Slots values should exclusively be accessed
#' @slot ppm,peakwidth,snthresh,prefilter,mzCenterFun,integrate,mzdiff,fitgauss,noise,verboseColumns,roiList,firstBaselineCheck,roiScales,extendLengthMSW,verboseBetaColumns See corresponding parameter above. Slots values should exclusively be accessed
#' \emph{via} the corresponding getter and setter methods listed above.
#'
#' @rdname findChromPeaks-centWave
Expand Down Expand Up @@ -533,7 +541,8 @@ setClass("CentWaveParam",
roiList = "list",
firstBaselineCheck = "logical",
roiScales = "numeric",
extendLengthMSW = "logical"
extendLengthMSW = "logical",
verboseBetaColumns = "logical"
),
contains = c("Param"),
prototype = prototype(
Expand All @@ -550,7 +559,8 @@ setClass("CentWaveParam",
roiList = list(),
firstBaselineCheck = TRUE,
roiScales = numeric(),
extendLengthMSW = FALSE
extendLengthMSW = FALSE,
verboseBetaColumns = FALSE
),
validity = function(object) {
msg <- character()
Expand Down Expand Up @@ -1242,7 +1252,7 @@ NULL
#' \code{\link{CentWaveParam}} for all methods and arguments this class
#' inherits.
#'
#' @slot ppm,peakwidth,snthresh,prefilter,mzCenterFun,integrate,mzdiff,fitgauss,noise,verboseColumns,roiList,firstBaselineCheck,roiScales,snthreshIsoROIs,maxCharge,maxIso,mzIntervalExtension,polarity
#' @slot ppm,peakwidth,snthresh,prefilter,mzCenterFun,integrate,mzdiff,fitgauss,noise,verboseColumns,roiList,firstBaselineCheck,roiScales,extendLengthMSW,verboseBetaColumns,snthreshIsoROIs,maxCharge,maxIso,mzIntervalExtension,polarity
#' See corresponding parameter above.
#'
#' @rdname findChromPeaks-centWaveWithPredIsoROIs
Expand Down
167 changes: 128 additions & 39 deletions R/do_findChromPeaks-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,12 @@
#' \item{scmin}{Left peak limit found by wavelet analysis (scan number).}
#' \item{scmax}{Right peak limit found by wavelet analysis (scan numer).}
#' }
#' Additional columns for \code{verboseBetaColumns = TRUE}:
#' \describe{
#'
#' \item{beta_cor}{Correlation between an "ideal" bell curve and the raw data}
#' \item{beta_snr}{Signal-to-noise residuals calculated from the beta_cor fit}
#' }
#'
#' @author Ralf Tautenhahn, Johannes Rainer
#'
Expand Down Expand Up @@ -145,7 +151,8 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
firstBaselineCheck = TRUE,
roiScales = NULL,
sleep = 0,
extendLengthMSW = FALSE) {
extendLengthMSW = FALSE,
verboseBetaColumns = FALSE) {
if (getOption("originalCentWave", default = TRUE)) {
## message("DEBUG: using original centWave.")
.centWave_orig(mz = mz, int = int, scantime = scantime,
Expand All @@ -156,7 +163,8 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
verboseColumns = verboseColumns, roiList = roiList,
firstBaselineCheck = firstBaselineCheck,
roiScales = roiScales, sleep = sleep,
extendLengthMSW = extendLengthMSW)
extendLengthMSW = extendLengthMSW,
verboseBetaColumns = verboseBetaColumns)
} else {
## message("DEBUG: using modified centWave.")
.centWave_new(mz = mz, int = int, scantime = scantime,
Expand All @@ -178,7 +186,7 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
noise = 0, ## noise.local=TRUE,
sleep = 0, verboseColumns = FALSE, roiList = list(),
firstBaselineCheck = TRUE, roiScales = NULL,
extendLengthMSW = FALSE) {
extendLengthMSW = FALSE, verboseBetaColumns = FALSE) {
## Input argument checking.
if (missing(mz) | missing(int) | missing(scantime) | missing(valsPerSpect))
stop("Arguments 'mz', 'int', 'scantime' and 'valsPerSpect'",
Expand Down Expand Up @@ -218,6 +226,7 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
"into", "intb", "maxo", "sn")
verbosenames <- c("egauss", "mu", "sigma", "h", "f", "dppm", "scale",
"scpos", "scmin", "scmax", "lmin", "lmax")
betanames <- c("beta_cor", "beta_snr")

## Peak width: seconds to scales
scalerange <- round((peakwidth / mean(diff(scantime))) / 2)
Expand All @@ -226,15 +235,19 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
scalerange <- scalerange[-z]
if (length(scalerange) < 1) {
warning("No scales? Please check peak width!")
if (verboseColumns) {
nopeaks <- matrix(nrow = 0, ncol = length(basenames) +
length(verbosenames))
colnames(nopeaks) <- c(basenames, verbosenames)
} else {
nopeaks <- matrix(nrow = 0, ncol = length(basenames))
colnames(nopeaks) <- c(basenames)
}
return(invisible(nopeaks))
matrix_length <- length(basenames)
matrix_names <- basenames
if (verboseColumns) {
matrix_length <- matrix_length + length(verbosenames)
matrix_names <- c(matrix_names, verbosenames)
}
if (verboseBetaColumns) {
matrix_length <- matrix_length + length(betanames)
matrix_names <- c(matrix_names, betanames)
}
nopeaks <- matrix(nrow = 0, ncol = matrix_length)
colnames(nopeaks) <- matrix_names
return(invisible(nopeaks))
}

if (length(scalerange) > 1)
Expand Down Expand Up @@ -319,15 +332,19 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
## ROI.list <- findmzROI(object,scanrange=scanrange,dev=ppm * 1e-6,minCentroids=minCentroids, prefilter=prefilter, noise=noise)
if (length(roiList) == 0) {
warning("No ROIs found! \n")
if (verboseColumns) {
nopeaks <- matrix(nrow = 0, ncol = length(basenames) +
length(verbosenames))
colnames(nopeaks) <- c(basenames, verbosenames)
} else {
nopeaks <- matrix(nrow = 0, ncol = length(basenames))
colnames(nopeaks) <- c(basenames)
}
return(invisible(nopeaks))
matrix_length <- length(basenames)
matrix_names <- basenames
if (verboseColumns) {
matrix_length <- matrix_length + length(verbosenames)
matrix_names <- c(matrix_names, verbosenames)
}
if (verboseBetaColumns) {
matrix_length <- matrix_length + length(betanames)
matrix_names <- c(matrix_names, betanames)
}
nopeaks <- matrix(nrow = 0, ncol = matrix_length)
colnames(nopeaks) <- matrix_names
return(invisible(nopeaks))
}
}

Expand Down Expand Up @@ -525,7 +542,8 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
td[best.scale.pos],
td[lwpos],
td[rwpos], ## Peak positions guessed from the wavelet's (scan nr)
NA, NA)) ## Peak limits (scan nr)
NA, NA, ## Peak limits (scan nr)
NA, NA)) ## Beta fitting values
peakinfo <- rbind(peakinfo,
c(best.scale, best.scale.nr,
best.scale.pos, lwpos, rwpos))
Expand All @@ -538,7 +556,7 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,

## postprocessing
if (!is.null(peaks)) {
colnames(peaks) <- c(basenames, verbosenames)
colnames(peaks) <- c(basenames, verbosenames, betanames)
colnames(peakinfo) <- c("scale", "scaleNr", "scpos",
"scmin", "scmax")
for (p in 1:dim(peaks)[1]) {
Expand All @@ -561,6 +579,16 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,
lm <- .narrow_rt_boundaries(lm, d)
lm_seq <- lm[1]:lm[2]
pd <- d[lm_seq]

# Implement a fit of a skewed gaussian (beta distribution)
# for peak shape and within-peak signal-to-noise ratio
# See https://doi.org/10.1186/s12859-023-05533-4 and
# https://github.com/sneumann/xcms/pull/685
if(verboseBetaColumns){
beta_vals <- .get_beta_values(pd)
peaks[p, "beta_cor"] <- beta_vals$best_cor
peaks[p, "beta_snr"] <- beta_vals$beta_snr
}

peakrange <- td[lm]
peaks[p, "rtmin"] <- scantime[peakrange[1]]
Expand Down Expand Up @@ -675,22 +703,30 @@ do_findChromPeaks_centWave <- function(mz, int, scantime, valsPerSpect,

if (length(peaklist) == 0) {
warning("No peaks found!")

if (verboseColumns) {
nopeaks <- matrix(nrow = 0, ncol = length(basenames) +
length(verbosenames))
colnames(nopeaks) <- c(basenames, verbosenames)
} else {
nopeaks <- matrix(nrow = 0, ncol = length(basenames))
colnames(nopeaks) <- c(basenames)
}
message(" FAIL: none found!")
return(nopeaks)
matrix_length <- length(basenames)
matrix_names <- basenames
if (verboseColumns) {
matrix_length <- matrix_length + length(verbosenames)
matrix_names <- c(matrix_names, verbosenames)
}
if (verboseBetaColumns) {
matrix_length <- matrix_length + length(betanames)
matrix_names <- c(matrix_names, betanames)
}
nopeaks <- matrix(nrow = 0, ncol = matrix_length)
colnames(nopeaks) <- matrix_names
message(" FAIL: none found!")
return(nopeaks)
}
p <- do.call(rbind, peaklist)
if (!verboseColumns)
p <- p[, basenames, drop = FALSE]

keepcols <- basenames
if (verboseColumns){
keepcols <- c(keepcols, verbosenames)
}
if(verboseBetaColumns){
keepcols <- c(keepcols, betanames)
}
p <- p[, keepcols, drop = FALSE]
uorder <- order(p[, "into"], decreasing = TRUE)
pm <- as.matrix(p[,c("mzmin", "mzmax", "rtmin", "rtmax"), drop = FALSE])
uindex <- rectUnique(pm, uorder, mzdiff, ydiff = -0.00001) ## allow adjacent peaks
Expand Down Expand Up @@ -2623,6 +2659,11 @@ do_findKalmanROI <- function(mz, int, scantime, valsPerSpect,
#' \item{scmin}{Left peak limit found by wavelet analysis (scan number).}
#' \item{scmax}{Right peak limit found by wavelet analysis (scan numer).}
#' }
#' Additional columns for \code{verboseBetaColumns = TRUE}:
#' \describe{
#' \item{beta_cor}{Correlation between an "ideal" bell curve and the raw data}
#' \item{beta_snr}{Signal-to-noise residuals calculated from the beta_cor fit}
#' }
#'
#' @rdname do_findChromPeaks_centWaveWithPredIsoROIs
#'
Expand All @@ -2634,7 +2675,8 @@ do_findChromPeaks_centWaveWithPredIsoROIs <-
verboseColumns = FALSE, roiList = list(),
firstBaselineCheck = TRUE, roiScales = NULL, snthreshIsoROIs = 6.25,
maxCharge = 3, maxIso = 5, mzIntervalExtension = TRUE,
polarity = "unknown", extendLengthMSW = FALSE) {
polarity = "unknown", extendLengthMSW = FALSE,
verboseBetaColumns = FALSE) {
## Input argument checking: most of it will be done in
## do_findChromPeaks_centWave
polarity <- match.arg(polarity, c("positive", "negative", "unknown"))
Expand All @@ -2655,7 +2697,8 @@ do_findChromPeaks_centWaveWithPredIsoROIs <-
roiList = roiList,
firstBaselineCheck = firstBaselineCheck,
roiScales = roiScales,
extendLengthMSW = extendLengthMSW)
extendLengthMSW = extendLengthMSW,
verboseBetaColumns = verboseBetaColumns)
return(do_findChromPeaks_addPredIsoROIs(mz = mz, int = int,
scantime = scantime,
valsPerSpect = valsPerSpect,
Expand Down Expand Up @@ -3668,3 +3711,49 @@ peaksWithCentWave <- function(int, rt,
}
lm
}


#' @description
#'
#' Calculate beta parameters for a chromatographic peak, both its similarity
#' to a bell curve of varying degrees of skew and the standard deviation of the
#' residuals after the best-fit bell is normalized and subtracted.
#'
#' @param eic_ints A numeric vector corresponding to the peak intensities
jorainer marked this conversation as resolved.
Show resolved Hide resolved
#' @param skews A numeric vector of the skews to try, corresponding to the
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly
#' right-skewed, while values greater than 5 will be left-skewed.
#'
#' @author William Kumler
#'
#' @noRd
.get_beta_values <- function(eic_ints, skews=c(3, 3.5, 4, 4.5, 5)){
if(length(eic_ints)<5){
jorainer marked this conversation as resolved.
Show resolved Hide resolved
best_cor <- NA
beta_snr <- NA
} else {
beta_sequence <- rep(seq(0, 1, length.out=length(eic_ints)), each=5)
beta_vals <- t(matrix(dbeta(beta_sequence, shape1 = skews, shape2 = 5), nrow = 5))
# matplot(beta_vals)
beta_cors <- cor(eic_ints, beta_vals)
jorainer marked this conversation as resolved.
Show resolved Hide resolved
best_cor <- max(beta_cors)
best_curve <- beta_vals[,which.max(beta_cors)]
noise_level <- sd(diff(.scale_zero_one(best_curve)-.scale_zero_one(eic_ints)))
beta_snr <- log10(max(eic_ints)/noise_level)
jorainer marked this conversation as resolved.
Show resolved Hide resolved
}
list(best_cor=best_cor, beta_snr=beta_snr)
}

#' @description
#'
#' Simple helper function to scale a numeric vector between zero and one by
#' subtracting the lowest value and dividing by the maximum.
#'
#' @param num_vec `numeric` vector, typically chromatographic intensities.
#'
#' @author William Kumler
#'
#' @noRd
.scale_zero_one <- function(num_vec){
(num_vec-min(num_vec))/(max(num_vec)-min(num_vec))
jorainer marked this conversation as resolved.
Show resolved Hide resolved
}
8 changes: 6 additions & 2 deletions R/functions-Params.R
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,15 @@ CentWaveParam <- function(ppm = 25, peakwidth = c(20, 50), snthresh = 10,
integrate = 1L, mzdiff = -0.001, fitgauss = FALSE,
noise = 0, verboseColumns = FALSE, roiList = list(),
firstBaselineCheck = TRUE, roiScales = numeric(),
extendLengthMSW = FALSE) {
extendLengthMSW = FALSE, verboseBetaColumns = FALSE) {
return(new("CentWaveParam", ppm = ppm, peakwidth = peakwidth,
snthresh = snthresh, prefilter = prefilter,
mzCenterFun = mzCenterFun, integrate = as.integer(integrate),
mzdiff = mzdiff, fitgauss = fitgauss, noise = noise,
verboseColumns = verboseColumns, roiList = roiList,
firstBaselineCheck = firstBaselineCheck, roiScales = roiScales,
extendLengthMSW = extendLengthMSW))
extendLengthMSW = extendLengthMSW,
verboseBetaColumns=verboseBetaColumns))
}

#' @return The \code{MatchedFilterParam} function returns a
Expand Down Expand Up @@ -213,6 +214,7 @@ CentWavePredIsoParam <- function(ppm = 25, peakwidth = c(20, 50), snthresh = 10,
integrate = 1L, mzdiff = -0.001, fitgauss = FALSE,
noise = 0, verboseColumns = FALSE, roiList = list(),
firstBaselineCheck = TRUE, roiScales = numeric(),
extendLengthMSW = FALSE, verboseBetaColumns = FALSE,
snthreshIsoROIs = 6.25, maxCharge = 3, maxIso = 5,
mzIntervalExtension = TRUE, polarity = "unknown") {
return(new("CentWavePredIsoParam", ppm = ppm, peakwidth = peakwidth,
Expand All @@ -221,6 +223,8 @@ CentWavePredIsoParam <- function(ppm = 25, peakwidth = c(20, 50), snthresh = 10,
mzdiff = mzdiff, fitgauss = fitgauss, noise = noise,
verboseColumns = verboseColumns, roiList = roiList,
firstBaselineCheck = firstBaselineCheck, roiScales = roiScales,
extendLengthMSW = extendLengthMSW,
verboseBetaColumns = verboseBetaColumns,
snthreshIsoROIs = snthreshIsoROIs, maxIso = as.integer(maxIso),
maxCharge = as.integer(maxCharge),
mzIntervalExtension = mzIntervalExtension, polarity = polarity))
Expand Down
6 changes: 6 additions & 0 deletions R/functions-XCMSnExp.R
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,12 @@ dropGenericProcessHistory <- function(x, fun) {
meanMz <- do.call(mzCenterFun, list(mtx[, 2], mtx[, 3]))
if (is.na(meanMz)) meanMz <- mtx[maxi[1], 2]
res[i, "mz"] <- meanMz

if("beta_cor"%in%cn){
beta_vals <- .get_beta_values(mtx[,3])
res[i, "beta_cor"] <- beta_vals$best_cor
res[i, "beta_snr"] <- beta_vals$beta_snr
}
} else {
res[i, ] <- rep(NA_real_, ncols)
}
Expand Down
Binary file modified data/faahko_sub.RData
Binary file not shown.
Loading
Loading