Added expansion factors for the mz and rt range used in fillPeaks.chrom. #3

stanstrup · 2014-04-17T08:15:53Z

No description provided.

sneumann · 2014-04-23T08:41:06Z

Hi, I am trying a few things with the new code, and would like to add
a unitTest before pulling.

    library(faahKO)
    library(RUnit)

    xsg <- group(faahko)

    xsgf1 <- fillPeaks(xsg, method="chrom")
    xsgf0909 <- fillPeaks(xsg, method="chrom", expand.rt=0.9, expand.mz=0.9)
    xsgf1111 <- fillPeaks(xsg, method="chrom", expand.rt=1.1, expand.mz=1.1)

   # Perfect correlation: as expected:
   plot(peaks(xsgf1)[xsgf1@filled, "into"], peaks(xsgf1)[xsgf1@filled, "into"])

   # Shouldn't all peaks in xsgf1111 be >= xsgf1 ?
   plot(peaks(xsgf1)[xsgf1@filled, "into"], peaks(xsgf1111)[xsgf1111@filled, "into"])

   # Shouldn't all peaks in xsgf0909 be <= xsgf1 ?
   plot(peaks(xsgf1)[xsgf1@filled, "into"], peaks(xsgf0909)[xsgf0909@filled, "into"])

    # Actual test:
    checkTrue(all(peaks(xsgf1)[xsgf1@filled, "into"] == peaks(xsgf1)[xsgf1@filled, "into"]))
    checkTrue(all(peaks(xsgf1)[xsgf1@filled, "into"] <= peaks(xsgf1111)[xsgf1111@filled, "into"]))
    checkTrue(all(peaks(xsgf1)[xsgf1@filled, "into"] >= peaks(xsgf0909)[xsgf0909@filled, "into"]))

stanstrup · 2014-04-23T17:11:24Z

Yes I would expect the same. It is not the case? If not I can investigate next week when I get back.

sneumann · 2014-04-23T18:08:30Z

stanstrup · 2014-04-23T18:22:41Z

Ok. Damn. I will investigate next week. I wonder if these are at the extremes of the retention time range and getPeaks doesn't like it or something.

… file

stanstrup · 2014-04-28T00:25:16Z

The added zeros were indeed because getPeaks doesn't handle if the rt range of the file is exceeded. I have added a check to limit rtmin and rtmax to the limits in each file (in my fork, pull request not submitted yet).

There is still something strange going on. getPeaks is returning slightly lower values for some peaks in some samples even with expanded peakrange. I have not figure out why yet. I guess I will have to understand exactly how getPeaks works to figure that out so no pull request yet.

Example to see the problems:

library(faahKO)
library(RUnit)
library(stringr)

xsg <- group(faahko)

xsgf1 <- fillPeaks(xsg, method="chrom")
xsgf1111 <- fillPeaks(xsg, method="chrom", expand.rt=1.1, expand.mz=1.1)

select = !(peaks(xsgf1)[xsgf1@filled, "into"] <= peaks(xsgf1111)[xsgf1111@filled, "into"])


View(peaks(xsgf1)[xsgf1@filled[select],])
View(peaks(xsgf1111)[xsgf1111@filled[select],])

Any ideas?

stanstrup · 2014-04-29T13:57:15Z

I have tracked down the problem. The problem is that getPeaks calculates the average peakwidth:

pwid <- diff(stime[iret])/diff(iret)

With the expanded peakrange that obviously lead to inclusion of additional scans as intended.
The problem is that the extra scans might have zero intensity; and hence the only difference in the calculated peak area is the average peakwidth that is used for the calculation.

rmat[i,7] <- pwid*sum(ymax)

This average peakwidth is occasionally lower if the extra scan has a slightly shorter than "normal" "width". This in turn makes the calculated area lower. The difference is really really small in the cases I have seen.

So the only solutions I can see is to use the peakwidth for each scan for the calculation or not use this particular unit test. I would be hesitant to change such a fundamental calculation in xcms...

stanstrup · 2014-04-30T12:47:00Z

I have added min.width.mz and min.width.rt arguments. In my test dataset setting a reasonable min.width.mz was much more important than expanding the range by a factor. It almost cut the zeros in half whereas the expansion factor only cut a few percent point.

The dataset had some features with ridiculously low mz ranges making re-integration impossible:

…krange matrix must not be coersed to a vector when there is only one peak to expand.

ColeWunderlich · 2014-06-20T16:44:27Z

I think this adds some nice features that can optionally be 'turned off' if the user desires to use the original functionality. I support pulling this, and am currently using it in my own code to prevent certain cases when Pwid = Na which I believe is being caused by the M/z range being too narrow.

trljcl · 2014-06-25T10:52:16Z

Sorry, I think there are a number of issues here. If this is implemented, the defaults should be FALSE or NULL. Can someone please explain why expand.rt and expand.mz are needed in the first place in fillPeaks? Is this:

A workaround implemented to counter the shrunken mz and rt range for some peaks in the group, which is the default fillPeaks behaviour related to using group medians? The mz and rt ranges passed to getPeaks within fillPeaks are effectively medians of the group minima and maxima for the mz and rt values. This is a compromise, as using the medians equates to half the peaks in the peaks in the group having predicted mz and rt boundaries narrower than actual, and the other half wider. Adding expansion factors would be similar to using the minimum of the group minima and maximum of the group maxima (instead of the medians), The rt and mz boundaries would then capture all peaks in the group (and may be a better way of defining expansion factors as these would dynamically change for each group). Predictions would be closer to actual areas but ONLY if there are no overlapping peaks from adjacent groups, in which case overestimations would occur. I suppose this is why the median is used - it is a compromise between underestimating areas for peaks in the group and overestimating areas by including peaks outside the group.
A workaround because getPeaks uses the profile matrix and a narrow mz boundary range may actually fall in less than one profile bin? getPeaks has a default step value of 0.1, a legacy of unit mass resolution data with peaks picked using matchedFilter. However, if findPeaks.centWave is used on high resolution data (so no profile matrix used for peak picking but a profile matrix IS generated with step = 0.1 for fillPeaks) it is perfectly possible that there may be some peak groups with mz widths < 0.01. Setting the xcmsSet object @ProfInfo$step to a low value (e.g. to 0.001) before running fillPeaks ensures a more accurate set of bins are used to estimate areas in getPeaks. It's not possible to set step as an argument in fillPeaks, it uses the value stored in the @ProfInfo slot.
Maybe not directly relevant here - but there are integration errors if scan to scan times vary at all. I pointed this out in https://groups.google.com/forum/#!topic/xcms-devel/bdkvPGSWOU0 and benjic has implemented this in a fork of xcms, but only for the initial peak finding done by centWave, and not for any subsequent peak filling done by getPeaks, so in this case there will be inconsistencies (see https://github.com/benjiec/xcms/tree/5532c360b02bbc823193d994d2eae9e32f985f04). I think this fix should be implemented across all findPeaks methods and getPeaks to ensure consistent integration.

cheers
Tony

stanstrup · 2014-06-25T12:31:47Z

This is workaround for when the scan and rt ranges found in the original profiling might not be representable, however they are defined. The reasons I could come up with for why this seem to be a pervasive problem with some datasets include:

The picked peaks are likely from the samples with the highest intensities. So I figured mz ranges are typically more narrow than can be expected for smaller peaks.
If you have peak groups based on a few samples both mz and rt ranges might not be representable of the whole dataset. That you only initially have a few well behaving samples might be because of differences in intensities but also if you have peaks with a low number of scans the peaks might not always be found.

As you can see from the plot above mz ranges go all the way down to 0 and the re-integration of peaks are therefore done on completely different terms even if you probably won't argue that the accuracy differs that much.

It is true that using the expansion factors run the risk of integration other peaks. But if a reasonable setting of min.width.rt and in particular min.width.mz creates overlap then this peak can never be trusted anyway; you have overlap that cannot be resolved by your system. I would think that the minimum settings would be much less dangerous than using the group minima and maxima.

Why must the defaults be NULL/FALSE? The current defaults (expand.mz=1,expand.rt=1,min.width.mz=0,min.width.rt=0) leaves the ranges unchanged and should be identical to the current version.

stanstrup · 2015-02-17T14:39:29Z

It looks like my first pull request was accepted (the expand.mz and expand.rt) but not the min.width.mz and min.width.rt pull request. Was that intended?

stanstrup · 2015-04-23T09:07:08Z

Was a decision made on this issue?

stanstrup · 2016-12-19T10:19:41Z

I'd still very much like to get this finished...

Added expansion factors for the mz and rt range used in fillPeaks.chrom.

e07350c

stanstrup added 2 commits April 28, 2014 02:15

Making sure the expanded peakrange doesn't exceed the RT range in the…

8cc1790

… file

for getPeaks the input is an xcmsRaw not xcmsSet

88dd594

Added min.width.mz and min.width.rt for fillPeaks.chrom

7402f14

stanstrup added 2 commits May 1, 2014 15:54

Copy/paste bug fixed.

fccbb69

Fixed bug found by Cole Wunderlich. RowMeans need a matrix so the pea…

5107d8b

…krange matrix must not be coersed to a vector when there is only one peak to expand.

stanstrup mentioned this pull request Jan 24, 2016

Expanding peakrange in fillpeaks #2

Closed

stanstrup mentioned this pull request Feb 24, 2017

fillChromPeaks for XCMSnExp #130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added expansion factors for the mz and rt range used in fillPeaks.chrom. #3

Added expansion factors for the mz and rt range used in fillPeaks.chrom. #3

stanstrup commented Apr 17, 2014

sneumann commented Apr 23, 2014

stanstrup commented Apr 23, 2014

sneumann commented Apr 23, 2014

stanstrup commented Apr 23, 2014

stanstrup commented Apr 28, 2014

stanstrup commented Apr 29, 2014

stanstrup commented Apr 30, 2014

ColeWunderlich commented Jun 20, 2014

trljcl commented Jun 25, 2014

stanstrup commented Jun 25, 2014

stanstrup commented Feb 17, 2015

stanstrup commented Apr 23, 2015

stanstrup commented Dec 19, 2016

Added expansion factors for the mz and rt range used in fillPeaks.chrom. #3

Are you sure you want to change the base?

Added expansion factors for the mz and rt range used in fillPeaks.chrom. #3

Conversation

stanstrup commented Apr 17, 2014

sneumann commented Apr 23, 2014

stanstrup commented Apr 23, 2014

sneumann commented Apr 23, 2014

stanstrup commented Apr 23, 2014

stanstrup commented Apr 28, 2014

stanstrup commented Apr 29, 2014

stanstrup commented Apr 30, 2014

ColeWunderlich commented Jun 20, 2014

trljcl commented Jun 25, 2014

stanstrup commented Jun 25, 2014

stanstrup commented Feb 17, 2015

stanstrup commented Apr 23, 2015

stanstrup commented Dec 19, 2016