title | author | date | output | ||||
---|---|---|---|---|---|---|---|
M2ara Manual |
Thomas Enzlein |
09.08.2024 |
|
This is M2ara, a shiny application based on the R-package MALDIcellassay
which can be found at GitHub. It is intended to detect biomarkers in cell-based MALDI assays by generating dose-response curves. The methods used were originally published by Weigt et. al., 2019 and the R-package implementing the methods was published together with the Nature Protocols publication of Unger et. al., 2021.
This manual describes some of the details of the inner-workings of the MALDIcellassay
-package and how this shiny application (M2ara) is supposed to be used.
The following features were already part of MALDIcellassay
:
- pre-processing though the
MALDIquant
-package - re-calibration to a single m/z (single point-re-calibration)
- normalization to a single m/z
- fitting curves to the data using the
nplr
-package
M2ara adds the following features:
- graphical user interface
- interactive data exploration
- support for mzML data *
- calculation of quality metrics (FZ, FV, log2FC, CRS) *
- feature ranking by metric *
- principle component analysis (PCA)
- curve clustering
- outlier detection *
* The features marked with a asterisks were re-implemented to MALDIcellassay
.
The blue question mark icons () throughout the application can be clicked and provide further information on the specific settings.
For best results, a concentration curve should consist of at least 7 points (better 9 points). To calculate all necessary Scores there should be at least two replicates (better 4) per concentration.
For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways:
- as filename (see below)
- as mapping file (*.txt) containing the concentrations of the spectra in the right order one per line.
The mapping file can be uploaded before loading the spectra using Settings -> Conc. mapping. This method can be used for Bruker and mzML. As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers).
This application supports Bruker flex raw data as generated by instruments of the Bruker-Flex series (e.g. RapiFleX, UltraFleX, AutoFleX). At the moment there is no support for timsTOF or SolariX data directly but import via mzML is possible.
The organization of an experiment in the Flex format needs to be as follows:
20191209/ name of the experiment
├── 0/ 1st Concentration of compound / Name of Sample 1
│ ├── 0_O13/ Measurement replicate for sample 1
│ ├── 0_O14/ Measurement replicate for sample 1
│ ├── 0_P13/ Measurement replicate for sample 1
│ └── 0_P13/ Measurement replicate for sample 1
├── 0.04/ 2nd Concentration of compound / Name of Sample 2
│ ├── 0_O15/ Measurement replicate for sample 2
│ ├── 0_O16/ Measurement replicate for sample 2
│ ├── 0_P15/ Measurement replicate for sample 2
│ └── 0_P16/ Measurement replicate for sample 2
├── 0.12/ 3rd Concentration of compound / Name of Sample
etc.
Briefly: Each spectrum has to reside in a folder which is named according to the concentration used to treat the cells in the respective sample. The number of measurement replicates per concentration is unlimited (should typically be at least four to compensate for artifacts from e.g. matrix heterogeneity or preparation).
For mzML-import the mzML files need to be named with the corresponding concentration used for treatment. Please put all technical replicates for a given concentration into the same mzML file.
20191209/ name of the experiment
├── 0.mzML 1st Concentration of compound including all replicates
├── 0.04.mzML 2nd Concentration of compound including all replicates
├── 0.12.mzML 3rd Concentration of compound including all replicates
etc.
-
Click on the Select folder-button (1, see figure above) and select a folder containing your experiment (see Requirements to the raw data). The following dialog is displayed:
-
Click on the Load spectra-button (1) to import your spectra. Depending on the size of the experiment, loading takes 30s - 3 minutes.
-
You may now change any of the settings in the sidebar (2). Once your are satisfied click the Process spectra-button on the bottom of the sidebar.
-
Analyze your data by clicking on entries in the table on bottom left (3). The plots for the curve and the peaks will change accordingly (4). You may want to display error bars or use the slider to change the displayed m/z-range (aka zoom) in the plot displaying the peaks. Note, you do not need to re-upload your data if you want to play with the settings. Just click the Process spectra-button again to re-do the calculations and after a short time your results will be updated.
-
If you want to save the curve fit and peak profile of a given m/z-value you can click the download button below the peak table to save your results as *.csv.
The analysis pipeline consist of the following steps (see figure below for a graphical overview):
- The folder of the experiment is selected (see Requirements to the raw data)
- The data is loaded. Note, all steps after step 2 will use the data currently loaded. This means that there is no need to re-load the data if any changes are made to the settings.
Preprocessing
is applied to the raw data. This includes (in this order)Smoothing
using "Savitzky Golay" method,Baseline
substraction using "Top Hat" method,Square-root transformation
of the intensity,Detect peaks
of raw (single) spectra.- The peaks are used to do the (single-point)
recalibration
on the single (continuous) spectra additionally the peaks are also recalibrated themselfs. - The recalibrated single peaks are used to determine the normalization factor (does only apply for the
mz
normalization method). Thenormalization
is applied to the single (continuous) spectra. - The single peaks are used to do the alignment of the single (continuous) spectra.
Average spectra
: The single (continuous) spectra are used for averaging the measurement replicates for each concentration.Detect peaks
of average spectra.Intensity matrix
: The peaks of the average spectra are transformed into a matrix with columns representing m/z values and rows representing concentrations whereas cells contain the respective intensity.Varience filtering
is applied.Curve fitting
is performed.Quality metrics
are calculated (FV, FZ, SSMD, Log2FC, CRS).- The peaks can be selected in the
Peak table
. - The respective dose-response curve as well as the peak profile is visualized and might be saved.
The main Curve screen is intended for a univariate analysis in a peak-by-peak manner.
On the upper right fitted curves and individual data points are shown (error bars showing the standard deviation or standard error of the mean can be displayed using the drop down menu). This plot can be used to judge the goodness of fit and the general curve shape manually.
The upper left show's a zoom-in to the corresponding individual peaks. The level of zoom can be adjusted to either display details of the peaks or investigate the surroundings of a single e.g. to judge if it is part of a isotopic envelope.
Below the two plots the peak table is shown. Here all found signals as well as all metrics are displayed. The two upper plots will change if a signal is selected.
M²ara comes with a variety of helpful scores/metrics that are meant to help judging the quality of response curves.
In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control Zhang et al., 1999, Iversen et al., 2006, Ravkin et al., 2004 . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the Z' factor Zhang et al., 1999, defined by
is implemented into M²ara. The modified FZ score helps to make a judgment about the distance of the means (
A modified V' Ravkin et al., 2004 is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by
with
where
The
where
The FS is baed on the Strictly Standardized Mean Difference (SSMD) and is implemented Bray and Carpenter 2004; Zhang et al., 2007, with:
In short: The FS gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences.
with
and
and
The CRS combines three measures used to describe the quality of a response curve, the effect size defined as
The metrics screen enables to visualize different metrics (FZ, FV, FS, logFC, CRS as well as pEC50, etc.) as a function of m/z. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve.
The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration.
The lower left part shows different metrics (both assay quality metrics like FZ, FV, CRS and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. This functionality is currently only featured for Bruker raw data. And wont be visible with the mzML
input file format selected.
The lower right shows processing (and in case of Bruker data also some measurement meta data) as a summary.
A PCA (Principle component analysis) enables a multivariate view to the data by dimensional reduction. Although, on its own its hard to identify biomarkers/regulated signals with it, the PCA is highly useful to judge the general concentration-dependent differences introduced by the treatment. A high separation of the different concentrations shows that some multivariate effects are in place were-as a low separation hints at either low effects overall or effects that are unique to some single (and most likely rather small) peaks. This is why the PCA can be a nice addition to the univariate analysis featured on the Curves-screen The PCA can be generated by clicking on the Perform PCA
-Button.
The drop down menu's adjust the PC (Principle component) shown on the x- and y-axis. The sliders adjust the L1 (Lasso) and L2 (Ridge) penalty. A high L1 penalty will lead to a sparse (low amount of non-zero loading's) representations of the data, making it easier to identify factors (signals) that influence the separation shown in the scores plot. If the L1 penalty is set to 0 a normal (dense) PCA will be generated.
The loading's can used to identify peaks that have a high influence to the scores of the PCA.
Using the Summarise loadings
-button either the summarized (see figure above) or full (in a loadings vs m/z spectrum) loading's can be visualized. Using the Send to peak table
-button the numeric loading's can be send to the peak table on the Curve-screen to investigate easily if the overlap with univariate signals of interest (high scores in Z', V' or CRS) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own.
The cluster tab enables to cluster curves based on their shape to enable to detect signals of interest that follow a similar direction as one (or many) target signals.
On the right the individual (black) curves for all signals are shown together with their average curves trajectory (colored). The left plot shows all trajectories in direct comparison.
Using the slider the user needs to adjust the number of clusters to a reasonable value. The clustering metrics shown below can help but in the end non of these metrics is perfect and the clustering might work better for some data sets then for other. It is intended not as an analytic tool but rather as a helper to find curves with similar trajectories (e.g. identify all signals were the intensity goes up or down with increasing concentration). So the number of clusters should be selected in a way that the average trajectories line up as good as possible with the individual curves.
The File format
-menu can be used to select between Bruker raw data of mzML format (see Requirements to the raw data).
The Conc. mapping
upload button enables the upload of a mapping file containing one concentration for each spectrum. It needs to be in the *.txt-format and needs to contain one concentration (dont include units!) per line, one for each spectrum. The file needs to be uploaded before the spectra are loaded using the button on the sidebar if the mapping should be used.
The Peak window size
and Peak method
setting enables to change the peak detection. Usually a Peak window size
of 20 and the SuperSmoother method should lead to good results. Sometimes, especially if a small peak is close to a large one, this small peak might not be detected. In this cases the Peak window size
can be decreased or if this is still not enough the MAD peak detection method can be chosen. Please note that both will lead to much more signals being considered as valid peaks, so it makes sense to increase SNR at the same time.
The Exclude empty spectra
setting will exclude spectra that don't contain any signals.
To save results for a later usage the app includes the option to save all relevant processing parameters. This can be done by clicking: Settings
-> Save settings
. If also the path to the data should be saved this needs to be after setting the directory but before loading the spectra.
A file called settings.csv
is saved in the working directory containing all parameters.
If such a file is found at the start-up of the app, the parameters will be loaded as defaults.
As processing is typically fast, this is a more efficient (time & disk-space) process then to save the complete app-state including spectra and calculated values.
The curve fitting in the app is internally performed by the nplr-package that used the Richardson Formula for Logistic regression:
The parameters used for each single m/z can be downloaded from the app under Settings
-> Save fitting param.
.