Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
skschum authored Oct 10, 2019
1 parent 6ebab27 commit 8d404a4
Showing 1 changed file with 18 additions and 10 deletions.
28 changes: 18 additions & 10 deletions MFAssignR/vignettes/MFAssignR Vignette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Zheng, Q., Morimoto, M., Sato, H. and Fouquet, T.: Resolution-enhanced Kendrick
Zhurov, K. O., Kozhinov, A. N., Fornelli, L. and Tsybin, Y. O.: Distinguishing Analyte from Noise Components in Mass Spectra of Complex Samples: Where to Cut the Noise, Anal Chem, 86(7), 3308–3316, doi:10.1021/ac403278t, 2014.

##Molecular Formula (MF) Assignment
The MF assignment algorithm in MFAssign was adapted from the low mass moiety CHOFIT assignment algorithm developed by Green and Perdue (2015). In total there are 4 versions of MF Assign, including MFAssign(), MFAssignCHO(), MFAssignAll(), and MFAssignAll_MSMS(). Where MFAssign()includes external nested loops to assign additional heteroatoms, as described in Green and Perdue (2015) while MFAssignCHO() does not. Briefly, the CHOFIT algorithm uses low mass moieties such as CH4O-1 and C4O-3 to move around in the O/C and H/C space to assign MF with C, H, and O (CHO MF). These low mass moieties efficiently assign CHO MF without conventional loops. Additional combinatorial assignments with various heteroatoms are made using nested loops that subtract the mass of a heteroatom from the measured ion mass, creating a CHO core mass, which can then be assigned using the low mass moiety CHOFIT approach. This is further explained in Green and Perdue (2015) and Perdue and Green (2015).
The MF assignment algorithm in MFAssign was adapted from the low mass moiety CHOFIT assignment algorithm developed by Green and Perdue (2015). In total there are 2 versions of MFAssign, including MFAssign() and MFAssignCHO(). MFAssign() includes external nested loops to assign additional heteroatoms, as described in Green and Perdue (2015) while MFAssignCHO() does not. Briefly, the CHOFIT algorithm uses low mass moieties such as CH4O-1 and C4O-3 to move around in the O/C and H/C space to assign MF with C, H, and O (CHO MF). These low mass moieties efficiently assign CHO MF without conventional loops. Additional combinatorial assignments with various heteroatoms are made using nested loops that subtract the mass of a heteroatom from the measured ion mass, creating a CHO "core" mass, which can then be assigned using the low mass moiety CHOFIT approach. This is further explained in Green and Perdue (2015) and Perdue and Green (2015).

###MFAssign()
Using the low mass moiety and combinatorial assignment approach, MFAssign() can be used to assign MF with 12C, 1H, and 16O and a variety of heteroatoms and isotopes, including 2H, 13C, 14N, 15N, 31P, 32S, 34S, 35Cl, 37Cl,and 19F. It can also assign Na+ adducts, which are common in positive ion mode. Due to the increasing number of chemically reasonable MF with the increasing number of possible elements and increasing molecular weight, the output will provide a list of ambiguous and unambiguous MF.
Expand Down Expand Up @@ -113,9 +113,9 @@ At least one of these noise estimation functions should be run on the mass list
The SNplot function is used to show the mass spectrum with the masses below and above the cut point denoted using the same color scheme as in the histogram plots from either HistNoise() or KMDNoise().

##Internal Mass Recalibration
RecalList(), Recal(), and Recal_2() are functions pertaining to the internal mass recalibration method adapted from Kozhinov et al. (2013) and Savory et al. (2011) using a polynomial central moving average to estimate the weights used to recalibrate the masses (Kozhinov et al., 2013) applied to spectral segments (Savory et al., 2011). The function RecalList() can be used with the output of MFAssign() or MFAssignCHO() to generate a data frame containing potential recalibrant CH2 homologous series. There are a variety of metrics included in the output of this function to aid the user in picking suitable recalibrant series, these are described in greater detail in the example of RecalList() below. The user can select up to 10 homologous series as inputs for the mass recalibration with Recal() and Recal_2(). Recal() uses H2 and O KMD and z* series to identify additional MF that are related to the user selected recalibrants. In contrast, Recal_2() does not used those series to expand the pool of potential recalibrants, using only the peaks that correspond to the homologous series chosen as recalibrants. Other than this difference Recal() and Recal_2() work exactly the same. To avoid recalibration problems associated with too many recalibrant masses, the function uses a user-defined number of tallest peaks within a user-defined mass range “bin”. For example, if the bin width is set at 20 and the number of peaks is set at 2, the function will select the two tallest peaks within each 20 m/z window across the range of the spectrum. Additionally, when the monoisotopic peak chosen as a recalibrant has an identified 13C peak, that isotopic peak will also be added to the pool of recalibrants being used. After the recalibrants have been selected, they are split into mass windows of a user defined width (default is 50 m/z) and used to calculate the correction term according to the the adapted form of the Kozhinov et al. method. This will provide a different mass correction term for each mass window in the spectrum. Then the raw mass list(s) that are being recalibrated are split into the same mass windows, and the correction term that is associated with each window is used to correct the masses in that window, thus recalibrating the full spectrum section by section. In addition to the output of recalibrated mass lists the function also generates a plot that shows the recalibration peaks that were used in context with the overall mass spectrum, and produces an output data frame containing the mass, abundance, formula, and error for the recalibrants that were used.
RecalList(), and Recal() are the functions pertaining to the internal mass recalibration method adapted from Kozhinov et al. (2013) and Savory et al. (2011) using a polynomial central moving average to estimate the weights used to recalibrate the masses (Kozhinov et al., 2013) applied to spectral segments (Savory et al., 2011). The function RecalList() can be used with the output of MFAssign() or MFAssignCHO() to generate a data frame containing potential recalibrant CH2 homologous series. There are a variety of metrics included in the output of this function to aid the user in picking suitable recalibrant series, these are described in greater detail in the example of RecalList() below. The user can select up to 10 homologous series as inputs for the mass recalibration with Recal(). Option 3 in Recal() (opt = 3) uses H2 and O KMD and z* series to identify additional MF that are related to the user selected recalibrants. In contrast, option 4 in Recal() (opt = 4) does not used those series to expand the pool of potential recalibrants, using only the peaks that correspond to the homologous series chosen as recalibrants. Other than this difference options 3 and 4 work exactly the same. To avoid recalibration problems associated with too many recalibrant masses, the function uses a user-defined number of tallest peaks within a user-defined mass range "bin". For example, if the bin width is set at 20 and the number of peaks is set at 2, the function will select the two tallest peaks within each 20 m/z window across the range of the spectrum. Additionally, when the monoisotopic peak chosen as a recalibrant has an identified 13C peak, that isotopic peak will also be added to the pool of recalibrants being used. After the recalibrants have been selected, they are split into mass windows of a user defined width (default is 50 m/z) and used to calculate the correction term according to the the adapted form of the Kozhinov et al. method. This will provide a different mass correction term for each mass window in the spectrum. Then the raw mass list(s) that are being recalibrated are split into the same mass windows, and the correction term that is associated with each window is used to correct the masses in that window, thus recalibrating the full spectrum section by section. In addition to the output of recalibrated mass lists the function also generates a plot that shows the recalibration peaks that were used in context with the overall mass spectrum, and produces an output data frame containing the mass, abundance, formula, and error for the recalibrants that were used.

RecalX() and Recal_2X() are similar to Recal() and Recal_2(), but provide some iteration of the mass calibration and can be used more effectively with small mass windows. The homologous series are chosen in the same way as in Recal() and Recal_2(), but then they are used to do a single term recalibration for the entire spectrum instead of segments. These calibrated masses are then used to do a segmented recalibration. Within each segment the recalibrants from the previous step are used and then the tallest peaks assigned a molecular formula within each window are selected as recalibrants, with half above the central recalibrant and half below.
Options 1 and 2 in Recal() are similar to options 3 and 4, but perform a two step recalibration instead of a single step walking calibration (options 3 & 4). The first step uses the chosen recalibrants to calculate a single recalibration term for the entire spectrum, which is applied to to an initial mass error correction. The second step does a segmented (walking) calibration across the spectrum using the previous recalibrants as central peaks, around which a user defined number of additional recalibrants are chosen, the tallest peaks in each bin being given preference. Thus, each segment is calibrated using these recalibrant ions, providing increased mass accuracy. This method more closely mimics the method described in Kozhinov et al.

#Function Examples
##Recommended Order of Operations
Expand Down Expand Up @@ -218,7 +218,7 @@ Data <- read.csv("YourMassList.csv")
#You can read in an external data set. Make sure the first column is the measured ion mass and the second column is the measured ion abundance (intensity or relative abundance).
#Be sure to include a signal-to-noise level cut value so that the function will work properly.
Mono_Iso <- IsoFiltR(Data, SN = 500, Diffrat = 0.1)
Mono_Iso <- IsoFiltR(Data, SN = 500, Diffrat = 0.1, Sulfrat = 30, Carberr = 5, Sulferr = 5)
Mono <- Mono_Iso[["Mono"]]
Iso <- Mono_Iso[["Iso"]]
Expand All @@ -231,6 +231,10 @@ Iso <- Mono_Iso[["Iso"]]

* Diffrat - a user defined ratio to tighten (larger value) or loosen (lower value) the intensity thresholds for identifying a peak as an isotopic peak; default is 0.1.

* Carberr - a user defined value to determine the required mass accuracy for matchin a 12C to a 13C; default is 5 (ppm)

* Sulferr - a user defined value to determine the required mass accuracy for matchin a 32C to a 34C; default is 5 (ppm)

####IsoFiltR() Output
* The output of this function is a list containing two data frames. The first data frame is “Mono” and contains the monoisotopic masses and the masses that were not classified as either monoisotopic or polyisotopic. The second data frame is “Iso,” which contains the masses identified as polyisotopic.

Expand Down Expand Up @@ -464,11 +468,13 @@ The output is a data frame with nine columns for user evaluation of the possible
* Series Score - the number of actual observations in each series compared to the theoretical maximum number based on the CH2 homologous series. Values closer to 1 are preferred.


###Recal() and Recal_2(), RecalX() and Recal_2X()
Recal() performs recalibration on the Mono and Iso outputs from the IsoFiltR() function and generates a mass spectrum highlighting the selected recalibrant series. The recalibration is based on the first step of the recalibration method described by Kozhinov et al. 2013, which uses a polynomial central moving average to estimate the weights used to recalibrate the masses. Additionally, the concept of a segmented “walking” recalibration from Savory et al. 2011 is used to remove systematic biases in the calibration. The recalibrated output can then be fed directly into MFAssign() for MF assignment of the recalibrated masses. Additionally, the function will output a data frame containing the recalibrants with their original mass error and the new, recalibrated mass error. To improve the mass recalibration across the studied mass range, Recal() finds additional recalibrants related by H2 or O homologous series using Kendrick mass analysis and then selects the tallest peaks within a user defined mass range. Recal_2() usens only the peaks that are part of the chosen recalibrant series, with no automatic additional peak selection. After the recalibrants are selected, the mass spectrum is split into segments of a user defined width and the recalibrants within each segment are used to recalibrate each section. For the purposes of running the functions RecalX() and Recal_2X() are the same as Recal and Recal_2(), with the only exception being the addition of the "num" parameters for the X versions.
###Recal()
Recal() performs recalibration on the Mono and Iso outputs from the IsoFiltR() function and generates a mass spectrum highlighting the selected recalibrant series. The recalibration from based on the recalibration method described by Kozhinov et al. 2013, which uses a polynomial central moving average to estimate the weights used to recalibrate the masses. Additionally, the concept of a segmented “walking” recalibration from Savory et al. 2011 is used to remove systematic biases in the calibration. There are four options for recalibration, which are chosen with the parameter "opt" the options are denoted 1, 2, 3, and 4. Options 1 and 2 (opt = 1 or 2) are two step recalibration methods that do an initial single term recalibration of the entire spectrum, followed by a second step of segmented recalibration. Options 3 and 4 are single step segmented recalibration methods. Options 1 and 3 perform formula extension on the user defined recalibrant series to find any superior recalibrants, while options 2 and 4 do not perform formula extension to find additional recalibrants.

The recalibrated output can then be fed directly into MFAssign() for MF assignment of the recalibrated masses. Additionally, the function will output a data frame containing the recalibrants with their original mass error and the new, recalibrated mass error.

```{r, eval = FALSE}
Recalcheck <- Recal(df = Unambig, peaks = Mono, isopeaks = Iso, mode = "neg", SN = 500, mzRange = 50, series1 = "O4_Na_2", series2 = "O4_H_8", series3 = "O6_Na_8")
Recalcheck <- Recal(df = Unambig, peaks = Mono, isopeaks = Iso, mode = "neg", SN = 500, mzRange = 50, series1 = "O4_Na_2", series2 = "O4_H_8", series3 = "O6_Na_8", opt = 1)
Plot <- Recalcheck[["Plot"]]
Plot
Expand All @@ -478,7 +484,7 @@ List <- Recalcheck[["RecalList"]]
```

####Recal(), Recal_2(), RecalX(), and Recal_2X() input
####Recal() input
* df - the input data frame in the format of the output from MFAssign() or MFAssignCHO().

* peaks - the input data frame of two columns with measured ion mass in the first column and measured ion abundance in the second column; using our recommended sequence, this is the “Mono” output from IsoFiltR().
Expand All @@ -502,9 +508,11 @@ List <- Recalcheck[["RecalList"]]
* obs - the number of required recalibrant peaks within each bin; default is 2.

* num - sets the number of peaks on either side of defined recalibrant to choose
as additional recalibrants. Default is 5. (RecalX() and Recal_2X() only)
as additional recalibrants; default is 5.

*opt - chooses which version of Recal(), opt = 1 is two-step recalibration with formula extension, opt = 2 is two-step recalibration without formula extension, opt = 3 is one-step segmented calibration with formula extension, and opt = 4 is one-step segmented calibration without formula extension; default is opt = 1.

####Recal(), Recal_2(), RecalX(), and Recal_2X() output
####Recal() output
* Plot - mass spectrum with recalibrant series highlighted in blue with the rest of the mass spectrum in gray.

* Mono - a data frame of the recalibrated monoisotopic ion masses and their abundance, formatted for input to MFAssign().
Expand Down

0 comments on commit 8d404a4

Please sign in to comment.