Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galaxy wrappers for the MFAssignR functions #567

Merged
merged 39 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
fc6606f
initial commit of the noise estimation functions
KristinaGomoryova Aug 7, 2024
344abd3
wrapper for the MFAssignCHO function
KristinaGomoryova Aug 7, 2024
1289de2
wrapper for the mfassign function
KristinaGomoryova Aug 7, 2024
831a300
help section adjusted
KristinaGomoryova Aug 7, 2024
702b128
recalibration tools wrappers
KristinaGomoryova Aug 7, 2024
52cdb72
macro updated
KristinaGomoryova Aug 7, 2024
1698c28
write.table with tab as separator
KristinaGomoryova Aug 7, 2024
3f92d16
description updated
KristinaGomoryova Aug 7, 2024
46cd625
help section updated
KristinaGomoryova Aug 7, 2024
efc1467
label adjusted
KristinaGomoryova Aug 7, 2024
62d732d
macro updated
KristinaGomoryova Aug 7, 2024
c805110
input moved from macro to recallist.xml
KristinaGomoryova Aug 7, 2024
3c60aa8
series defined automatically
KristinaGomoryova Aug 7, 2024
ca48371
by ${tool.name} on ${on_string} added
KristinaGomoryova Aug 8, 2024
4890523
help text updated
KristinaGomoryova Aug 8, 2024
3903ded
inputs separated from macro
KristinaGomoryova Aug 8, 2024
1a144b8
macro updated
KristinaGomoryova Aug 8, 2024
c38ab7c
by ${tool.name} on ${on_string} added
KristinaGomoryova Aug 8, 2024
ead6bcc
help done via help.xml
KristinaGomoryova Aug 8, 2024
624badd
typo corrected
KristinaGomoryova Aug 8, 2024
de72a8b
Merge branch 'master' into mfassignr_kmdnoise
hechth Aug 8, 2024
85f3672
edam topics moved back to individual xml
KristinaGomoryova Aug 8, 2024
84cce27
edam_data removed
KristinaGomoryova Aug 8, 2024
3485968
fixed linting
hechth Aug 8, 2024
46c678e
replaced tool version with token
hechth Aug 8, 2024
b8aa695
profile="23.0" added
KristinaGomoryova Aug 8, 2024
2dbdd36
typo corrected
KristinaGomoryova Aug 8, 2024
828c48d
updated command section
hechth Aug 9, 2024
23c3773
fixed wrong script name
hechth Aug 9, 2024
5092f4b
updated default
hechth Aug 13, 2024
c825c0f
Added first section of tests
hechth Aug 15, 2024
364f430
finalized tests
hechth Aug 15, 2024
ed467af
removed file and changed check to size
hechth Aug 15, 2024
59c4d55
updated recal mzplot assertion
hechth Aug 15, 2024
bcb5b47
made testing for recallist less stringent
hechth Aug 15, 2024
485b12a
test updates for CI
hechth Aug 15, 2024
fff5199
updated test and data
hechth Aug 15, 2024
c7b13fa
updated recalibrants
hechth Aug 15, 2024
7ca71fd
Update tools/mfassignr/mfassignr_isofiltr.xml
hechth Aug 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions tools/mfassignr/help.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
<macros>

<token name="@GENERAL_HELP@">
General Information
===================

Overview
--------

MFAssignR is an R package for the molecular formula (MF) assignment of ultrahigh resolution mass spectrometry measurements. It contains several functions for the noise assessment, isotope filtering, interal mass recalibration, and MF assignment.

The MFAssignR package was originally developed by Simeon Schum et al. (2020), the source code can be found on `GitHub`_.
Please submit eventual Galaxy-related bug reports as `issues`_ on the repository.

.. _GitHub: https://github.com/skschum/MFAssignR
.. _issues: https://github.com/RECETOX/galaxytools/issues


Workflow
--------

.. image:: https://github.com/RECETOX/MFAssignR/raw/master/overview.png
:width: 1512
:height: 720
:scale: 60
:alt: A picture of a workflow diagram.

The recommended workflow how to run the MFAssignR package is as follows:

(1) Run KMDNoise() to determine the noise level for the data.
(2) Check effectiveness of S/N threshold using SNplot().
(3) Use IsoFiltR() to identify potential 13C and 34S isotope masses.
(4) Using the S/N threshold, and the two data frames output from IsoFiltR(), run MFAssignCHO() to assign MF with C, H, and O to assess the mass accuracy.
(5) Use RecalList() to generate a list of the potential recalibrant series.
(6) After choosing recalibrant series, use Recal() to recalibrate the mass lists.
(7) Assign MF to the recalibrated mass list using MFAssign().
(8) Check the output plots from MFAssign() to evaluate the quality of the assignments.

For detailed documentation on the individual steps please see the individual tool wrappers.
</token>

<token name="@KMDNOISE_HELP@">
MFAssignR - KMDNoise
=============================

This tool is the first step of the MFAssignR workflow (can be substitued by HistNoise or run in paralell).

KMDnoise is a Kendrick Mass Defect (KMD) approach for the noise estimation. It selects a subset of the data using the linear equation y=0.1132x + b, where y stands for the KMD value, x for the measured ion mass and b is the y-intercept. The default y-intercepts of 0.05 and 0.2 in KMDNoise are used to isolate the largest analyte free region of noise in most mass spectra. The intensity of the peaks within this “slice” are then averaged and that value is defined as the noise level for the mass spectrum. This value is then multiplied with a user-defined signal-to-noise ratio (typically 3-10) to remove low intensity m/z values.

Output:

- noise estimate - (this noise level can then be multiplied by the user chosen value (3, 6, 10) in order to set the signal to noise cut for formula assignment.)
- KMD plot - bounds of the noise estimation area are highlighted in red
</token>

<token name="@HISTNOISE_HELP@">
MFAssignR - HistNoise
=============================

This tool is the first step of the MFAssignR workflow (can be substitued by KMDNoise or run in paralell (-> SNplot)).

HistNoise function creates a histogram using natural log of the intensity, which can be then used to determine the noise level for the data analyze, and also the estimated noise level. The noise level can be then multiplied by whatever value in order to reach the value to be used to cut the data.

Output:

- noise estimate - this noise level can then be multiplied by the user chosen value in order to set the signal to noise cut for formula assignment
- Histogram - shows where the cut is being applied123

</token>

<token name="@SNPLOT_HELP@">
MFAssignR - SNplot
=============================

This tool is the second step of the MFAssignR workflow (KMDNoise -> SNplot -> IsoFiltR).

SNplot function plots the mass spectrum with the S/N cut denoted by different colors for the mass spectrum peaks (red indicates noise, blue indicates signal). This is useful for a qualitative look at the effectiveness of the S/N cut being used.

Output:

- SNplot - S/N colored mass spectrum showing where the cut is being applied
</token>

<token name="@ISOFILTR_HELP@">
MFAssignR - IsoFiltR
=============================

This tool is the third step of the MFAssignR workflow (SNplot -> IsoFiltR -> MFAssignCHO).

IsoFiltR identifies and separates likely isotopic masses from monoisotopic masses in a mass list. This should be done prior to formula assignment to reduce incorrect formula assignments.

Output:

- A dataframe of monoisotopic and non-matched masses
- A dataframe of isotopic masses
</token>

<token name="@MFASSIGNCHO_HELP@">
MFAssignR - MFAssignCHO
=============================

This tool is the fourth step of the MFAssignR workflow (IsoFiltR -> MFAssignCHO -> RecalList)

MFAssignCHO is a simplified version of MSAssign funcion, which only assigns MF with CHO elements. It is useful for the prelimiary MF assignments prior to the selection of internal recalibration ions in conjunction with RecalList and Recal.

Output:

- Unambig - data frame containing unambiguous assignments
- Ambig - data frame containing ambiguous assignments
- None - data frame containing unassigned masses
- MSAssign - ggplot of mass spectrum highlighting assigned/unassigned
- Error - ggplot of the Error vs. m/z
- MSgroups - ggplot of mass spectrum colored by molecular group
- VK - ggplot of van Krevelen plot, colored by molecular group
</token>

<token name="@RECALLIST_HELP@">
MFAssignR - RecalList
=============================

This tool is the fifth step of the MFAssignR workflow (MFAssignCHO -> RecalList -> Recal)

RecalList() function identifies the homologous series that could be used for recalibration. On the input, there is the output from MFAssign() or MFAssignCHO() functions. It returns a dataframe that contains the CH2 homologous series that contain more than 3 members.

Output:

- Dataframe that contains the CH2 homologous series that contain more than 3 members.
</token>

<token name="@RECAL_HELP@">
MFAssignR - Recal
=============================

This tool is the sixth step of the MFAssignR workflow (RecalList -> Recal -> MFAssign)

Recal() function recalibrates the 'Mono' and 'Iso' outputs from the IsoFiltR() function and prepares a dataframe containing chose recalibrants. Also it outputs a plot for the qualitative assessment of recalibrants. The input to the function is output from MFAssign() or MFAssignCHO().

It is important for recalibrant masses to cover the entire mass range of interest, and they should be among the most abundant peaks in their region of the spectrum - by default we take first 10 recalibrant series. We recommend to sort the Recalibration Series table based on the Series Score (largest to smallest). In case that error "Gap in recalibrant coverage, try adding more recalibrant series" would occur, we recommend to provide more diverse series.

Output:

- Mass spectrum
- Recalibrated dataframe of monoisotopic masses
- Recalibrated dataframe of isotopic masses
- Recalibrants list
</token>

<token name="@MFASSIGN_HELP@">
MFAssignR - MFAssign
=============================

This tool is the last step of the MFAssignR workflow (Recal -> MFAssign)

Recal() function recalibrates the 'Mono' and 'Iso' outputs from the IsoFiltR() function and prepares a dataframe containing chose recalibrants. Also it outputs a plot for the qualitative assessment of recalibrants. The input to the function is output from MFAssign() or MFAssignCHO().

It is important for recalibrant masses to cover the entire mass range of interest, and they should be among the most abundant peaks in their region of the spectrum - by default we take first 10 recalibrant series. We recommend to sort the Recalibration Series table based on the Series Score (largest to smallest). In case that error "Gap in recalibrant coverage, try adding more recalibrant series" would occur, we recommend to provide more diverse series.

Output:

- Unambig - data frame containing unambiguous assignments
- Ambig - data frame containing ambiguous assignments
- None - data frame containing unassigned masses
- MSAssign - ggplot of mass spectrum highlighting assigned/unassigned
- Error - ggplot of the Error vs. m/z
- MSgroups - ggplot of mass spectrum colored by molecular group
- VK - ggplot of van Krevelen plot, colored by molecular group
</token>
</macros>
91 changes: 91 additions & 0 deletions tools/mfassignr/macros.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,16 @@

<xml name="creator">
<creator>
<person
givenName="Kristina"
familyName="Gomoryova"
url="https://github.com/KristinaGomoryova"
identifier="0000-0003-4407-3917" />
<person
givenName="Helge"
familyName="Hecht"
url="https://github.com/hechth"
identifier="0000-0001-6744-996X" />
<person
givenName="Zargham"
familyName="Ahmad"
Expand All @@ -26,6 +36,87 @@
</creator>
</xml>

<xml name="kmdnoise_param">
<param name="input_file" type="data" format="tabular" label="Input data"
help= "Input data frame, first column is mass, second column is intensity"/>
<param name="upper_y" type="float" label="upper limit for the y intercept" value="0.2"
help= "The upper y-intercept value to isolate noise peaks in the equation for the KMD plot: y = 0.001123*x + b. Default value is set to 0.2, so that it does not interact with any potentially double-charged peaks."/>
<param name="lower_y" type="float" label="lower limit for the y intercept" value="0.05"
help="The lower y-intercept value to isolate noise peaks in the equation for the KMD plot: y = 0.001123*x + b. Default value is set to 0.05 to ensure no analyte peaks are incorporated into the noise estimation."/>
<param name="upper_x" optional="true" type="float" label="upper limit for the x intercept"
hechth marked this conversation as resolved.
Show resolved Hide resolved
help="If not set, it defaults to maximum mass in the mass spectrum."/>
<param name="lower_x" optional="true" type="float" label="lower limit for the x intercept"
help="If not set, it defaults to minimum mass in the mass spectrum."/>
</xml>

<xml name="histnoise_param">
<param name="input_file" type="data" format="tabular" label="Input data"
help= "Input data frame, first column is mass, second column is intensity"/>
<param name="SN" type="float" label="signal-to-noise threshold" value="0"
help= "A numeric value for situations where a predefined noise value is desired, default is 0"/>
<param name="bin" type="float" label="bindwidth of the histogram" value="0.01"
help= "A numeric value determining the binwidth of the histogram, default is 0.01"/>
</xml>

<xml name="noise_threshold_params">
<param name="sn_ratio" type="float" label="SN ratio" value="6"
help= "Noise multiplier. Recommended value is 6."/>
<param name="kmdn" type="float" label="Estimated noise" value="0"
help= "Estimated noise, either from the KMDNoise or HistNoise function."/>
</xml>

<xml name="snplot_param">
<param name="input_file" type="data" format="tabular" label="Input data"
help= "Input data frame, first column is mass, second column is intensity"/>
<param name="cut" type="float" label="cut"
help= "A numeric value of the intensity cut value being investigated"/>
<param name="mass" type="float" label="mass"
help= "A numeric value setting a centerpoint to look at the mass spectrum"/>
<param name="window_x" type="float" label="window.x" value="0.5"
help= "A numeric value setting the +/- range around the mass centerpoint, default is 0.5"/>
<param name="window_y" type="float" label="window.y" value="10"
help= "A numeric value setting the y axis value for the plot, determined by multiplying the cut by this value"/>
</xml>

<xml name="ionmode_param">
<param name="ionmode" type="select" display="radio" label="Ion mode" help= "The ionization mode.">
<option value="neg" >negative</option>
<option value="pos" selected="true">positive</option>
</param>
</xml>

<xml name="mfassign_param">
<param name="ppm_err" type="integer" label="ppm_err"
help= "Error tolerance (ppm) for formula assignment" value="3"/>
<expand macro="ionmode_param" />
<expand macro="noise_threshold_params" />
<param name="lowMW" type="float" label="Lower limit of molecular mass to be assigned" value="100"
help= "Lower limit of molecular mass to be assigned."/>
<param name="highMW" type="float" label="Upper limit of molecular mass to be assigned" value="1000"
help= "Upper limit of molecular mass to be assigned."/>
</xml>

<xml name="recal_param">
<param name="input_file" type="data" format="tabular" label="Input data (Output from MFAssign)"
help= "Input data frame, the output from MFAssign or MFAssignCHO"/>
<param name="series" type="data" format="tabular" label="Calibration series (Output from RecalList)"
help= "Calibration series (Output from RecalList). At maximum the first 10 rows are used."/>
<param name="peaks" type="data" format="tabular" label="Peaks dataframe (Mono from IsoFiltR)"
help= "Peaks data frame, the Mono output from IsoFiltR"/>
<param name="isopeaks" type="data" format="tabular" label="Isopeaks dataframe (Iso from IsoFiltR)"
optional="true" help= "Isopeaks data frame, the Mono output from IsoFiltR"/>
<expand macro="ionmode_param" />
<expand macro="noise_threshold_params" />
<param name="mzRange" type="float" label="Mass windows used for the segmented recalibration" value="30"
help= "Mass windows used for the segmented recalibration"/>
<param name="step_O" type="float" label="Number of oxygen steps for formula extension" value="3"
help= "Number of oxygen steps for formula extension"/>
<param name="step_H2" type="float" label="Number of H2 steps for formula extension" value="5"
help= "Number of H2 steps for formula extension"/>
<param name="CalPeak" type="float" label="Maximum allowed recalibrant peaks per mzRange defined segment" value="150" help= "Maximum allowed recalibrant peaks per mzRange defined segment"/>

</xml>

<xml name="isofiltr_param">
<param name="peaks" type="data" format="tabular" label="Input Peak Data"
help="The input data frame containing abundance and peak mass."/>
Expand Down
53 changes: 53 additions & 0 deletions tools/mfassignr/mfassignr_histnoise.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
<tool id="mfassignr_histnoise" name="MFAssignR HistNoise" version="@TOOL_VERSION@+galaxy0" profile="23.0">
<description>Noise level assessment using the HistNoise</description>
<macros>
<import>macros.xml</import>
<import>help.xml</import>
</macros>
<edam_topics>
<edam_topic>topic_3172</edam_topic>
</edam_topics>
<expand macro="creator" />
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
Rscript '${run_script}'
]]></command>
<configfiles>
<configfile name="run_script"><![CDATA[
df <- read.delim("$input_file", sep="\t")
assess_noise <- MFAssignR::HistNoise(
df = df,
SN = $SN,
bin = $bin
)
noise <- assess_noise[['Noise']]
write.table(noise, file = '$Noise', row.names= FALSE, col.names = FALSE)
ggplot2::ggsave(filename = "histplot.png", assess_noise[['Hist']])
]]></configfile>
</configfiles>
<inputs>
<expand macro="histnoise_param"/>
</inputs>
<outputs>
<data name="Noise" format="txt" label="Noise level estimate by ${tool.name} on ${on_string}"/>
<data name="Hist_plot" format="png" label="Histogram plot by ${tool.name} on ${on_string}" from_work_dir="histplot.png"/>
</outputs>
<tests>
<test>
<param name="input_file" value="QC1_1_POS_500.tabular" />
<output name="Noise" ftype="txt">
<assert_contents>
<has_text text="674849323.854921" />
</assert_contents>
</output>
<output name="Hist_plot" ftype="png" file="histnoise/plot.png">
</output>
</test>
</tests>
<help><![CDATA[
@HISTNOISE_HELP@

@GENERAL_HELP@
]]></help>
<expand macro="citations" />
</tool>
19 changes: 11 additions & 8 deletions tools/mfassignr/mfassignr_isofiltr.xml
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
<tool id="mfassignr_isofiltr" name="MFAssignR IsoFiltR" version="@TOOL_VERSION@+galaxy0">
<description>
IsoFiltR separates likely isotopic masses from monoisotopic masses in a mass list.
</description>
<tool id="mfassignr_isofiltr" name="MFAssignR IsoFiltR" version="@TOOL_VERSION@+galaxy0" profile="23.0">
<description> Separates likely isotopic masses from monoisotopic masses in a mass list. </description>
<macros>
hechth marked this conversation as resolved.
Show resolved Hide resolved
hechth marked this conversation as resolved.
Show resolved Hide resolved
<import>macros.xml</import>
</macros>
<edam_topics>
<edam_topic>topic_3172</edam_topic>
</edam_topics>
<edam_operations>
<edam_operation>operation_3629</edam_operation>
</edam_operations>
<expand macro="creator" />
<expand macro="refs" />

<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
Rscript
-e 'source("${mfassignr_isofiltr}")'
Rscript '${mfassignr_isofiltr}'
]]>
</command>
<configfiles>
Expand Down Expand Up @@ -40,8 +43,8 @@
<tests>
<test>
<param name="peaks" value="QC1_1_POS_500.tabular" ftype="tabular"/>
<output name="mono_out" file="isofiltr_output1.tabular" ftype="tabular"/>
<output name="iso_out" file="isofiltr_output2.tabular" ftype="tabular"/>
<output name="mono_out" file="isofiltr/mono_out.tabular" ftype="tabular"/>
<output name="iso_out" file="isofiltr/iso_out.tabular" ftype="tabular"/>
</test>
</tests>
<help>
Expand Down
Loading
Loading