diff --git a/DESCRIPTION b/DESCRIPTION index eaee3ef..6c12af2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: M2ara Title: A shiny GUI to explore concentration-responses in MALDI-based assays -Version: 1.4.0 +Version: 1.4.1 Authors@R: person("Thomas", "Enzlein", , "t.enzlein@hs-mannheim.de", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-1789-4090")) diff --git a/README.md b/README.md index 280da67..d1956ac 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ docker run -p 3838:3838 -v c:/path/to/massSpecData:/mnt thomasenzlein/m2ara:mai ### Stand-alone installer for Windows Use the stand-alone installer (Windows only, no R installation needed). -The installer can be downloaded [here](https://github.com/CeMOS-Mannheim/M2ara/releases/download/1.2/MALDIcellassay_1.2.exe). +The installer can be downloaded [here](https://github.com/CeMOS-Mannheim/M2ara/releases/download/1.4.1/M2ara_1.4.1.exe). ## Example data To test the app please use the example data on [FigShare](https://dx.doi.org/10.6084/m9.figshare.25736541). @@ -75,6 +75,8 @@ To replicate the results shown use the following parameters: - set binning tolerance to 100 ppm - select the folder `mzML` (parent folder of the mzML files) from the .zip file, please make sure that no other files are in this folder. +Alternatively, copy the [this file](https://github.com/CeMOS-Mannheim/M2ara/blob/main/tests/testthat/settings_mzML_data.csv) as `settings.csv` into the main folder of the app. + The target *m/z* is 349.11 (E3S, [M-H]-) the pIC50 value should be 6.1. #### Weigt2018_BCR-ABL_inhibition_Dasatinib_BrukerFlex.zip @@ -95,5 +97,7 @@ To replicate the results shown use the following parameters: - set binning tolerance to 100 ppm - select the the folder `curve` from the .zip file, make sure no other files/folders are present. +Alternatively, copy the [this file](https://github.com/CeMOS-Mannheim/M2ara/blob/main/tests/testthat/settings_bruker_data.csv) as `settings.csv` into the main folder of the app. + The target is *m/z* 826.5722 (PC(36:1) [M+K]+) and *m/z* 616.1767 (Heme B [M+H]+) the pIC50 values should be 9.5 and 9.7. diff --git a/app.R b/app.R index e7f4a4b..8611d72 100644 --- a/app.R +++ b/app.R @@ -1,11 +1,11 @@ -# check if all required packages are installed -source("functions/checkInstalledPackages.R") -checkInstalledPackages(req_file = "req.txt") - -knit("manual.Rmd", quiet = TRUE) - -source("components/ui.R") -source("components/server.R") - -# Run the application -shinyApp(ui = ui, server = server) +# check if all required packages are installed +source("functions/checkInstalledPackages.R") +checkInstalledPackages(req_file = "req.txt") + +knit("manual.Rmd", quiet = TRUE) + +source("components/ui.R") +source("components/server.R") + +# Run the application +shinyApp(ui = ui, server = server) diff --git a/components/mainpanel.R b/components/mainpanel.R index 912a94b..48c94c7 100644 --- a/components/mainpanel.R +++ b/components/mainpanel.R @@ -67,7 +67,7 @@ appMainPanel <- function(defaults) { ) ), #### Manual tab #### - tabPanel("Manual", htmltools::includeMarkdown("manual.md") + tabPanel("Manual", withMathJax(htmltools::includeMarkdown("manual.md")) ) ) ) diff --git a/manual.Rmd b/manual.Rmd index 8b716dd..dcbf741 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -2,7 +2,9 @@ title: "M2ara Manual" author: "Thomas Enzlein" date: "09.08.2024" -output: html_document +output: + html_document: + mathjax: "//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --- ```{r setup, include=FALSE} @@ -27,7 +29,7 @@ The following features were already part of `MALDIcellassay`: - graphical user interface - interactive data exploration - support for [mzML](#mzml) data \* -- calculation of quality metrics (Z', V', log2FC, CRS) \* +- calculation of quality metrics (*Z'*, *V'*, *log2FC*, *CRS*) \* - feature ranking by metric \* - principle component analysis (PCA) - curve clustering @@ -35,15 +37,23 @@ The following features were already part of `MALDIcellassay`: \* The features marked with a asterisks were re-implemented to `MALDIcellassay`. -## General information +# General information The blue question mark icons (`r icon("question-circle")`) throughout the application can be clicked and provide further information on the specific settings. -## Requirements to the raw data {#requirements-to-the-raw-data} +# Requirements to the raw data {#requirements-to-the-raw-data} -For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: 1. as filename (see below) 2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. Concentration mapping file The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). +For best results, a concentration curve should consist of at least 7 points (better 9 points). +To calculate all necessary [Scores](#scores) there should be at least two replicates (better 4) per concentration. -### Bruker Flex format (\*.fid) {#bruker} +For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: +1. as filename (see below) +2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. + +## Concentration mapping file +The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). + +## Bruker Flex format (\*.fid) {#bruker} This application supports Bruker flex raw data as generated by instruments of the Bruker-Flex series (e.g. RapiFleX, UltraFleX, AutoFleX). At the moment there is no support for timsTOF or SolariX data directly but import via [mzML](#mzml) is possible. @@ -67,7 +77,7 @@ etc. Briefly: Each spectrum has to reside in a folder which is named according to the concentration used to treat the cells in the respective sample. The number of measurement replicates per concentration is unlimited (should typically be at least four to compensate for artifacts from e.g. matrix heterogeneity or preparation). -### mzML {#mzml} +## mzML {#mzml} For mzML-import the mzML files need to be named with the corresponding concentration used for treatment. Please put all technical replicates for a given concentration into the same mzML file. @@ -79,7 +89,7 @@ For mzML-import the mzML files need to be named with the corresponding concentra etc. ``` -## Step-by-step {#curve-screen} +# Step-by-step {#curve-screen} ```{r, echo=FALSE, out.width="66%", fig.cap="Interface of the app"} knitr::include_graphics("figures/interface.png") @@ -96,7 +106,7 @@ knitr::include_graphics("figures/interface.png") 5. If you want to save the curve fit and peak profile of a given *m/z*-value you can click the download button below the peak table to save your results as \*.csv. -## Analysis pipeline +# Analysis pipeline The analysis pipeline consist of the following steps (see figure below for a graphical overview): @@ -111,7 +121,7 @@ The analysis pipeline consist of the following steps (see figure below for a gra 9. `Intensity matrix`: The peaks of the average spectra are transformed into a matrix with columns representing *m/z* values and rows representing concentrations whereas cells contain the respective intensity. 10. `Varience filtering` is applied. 11. `Curve fitting` is performed. -12. `Quality metrics` are calculated (V', Z', SSMD, Log2FC, CRS). +12. `Quality metrics` are calculated (*V'*, *Z'*, *SSMD*, *Log2FC*, *CRS*). 13. The peaks can be selected in the `Peak table`. 14. The respective dose-response curve as well as the peak profile is visualized and might be saved. @@ -119,11 +129,11 @@ The analysis pipeline consist of the following steps (see figure below for a gra knitr::include_graphics("figures/pipeline.png") ``` -## Individual screens +# Individual screens -### Main Tab +## Main Tab -#### Curve subtab +### Curve subtab The main [Curve](#curve-screen) screen is intended for a univariate analysis in a peak-by-peak manner. @@ -133,15 +143,97 @@ The upper left show's a zoom-in to the corresponding individual peaks. The level Below the two plots the peak table is shown. Here all found signals as well as all metrics are displayed. The two upper plots will change if a signal is selected. -#### Metrics subtab +#### Scores + +**M²ara** comes with a variety of helpful scores/metrics that are meant to help judging the quality of response curves. + +##### Modified Z': + +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z’* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by + +$$ +Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} +$$ + +is implemented into **M²ara**. The modified *Z'* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. + +##### Modified V': + +The modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by + +$$ +V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} +$$ + +with + +$$ +\sigma_f=\sqrt{\frac{1}{N}\sum(f_{exp}-f)^2} +$$ + +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V'* factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: *V'* focuses more on the goodness of fit of the curve to the data points. + +##### Log2-Fold-Change + +The $Log_2FC$ denotes the magnitude i.e. effect size of a response. It is defined as: + +$$ +Log_2 FC =log_2\frac{a_u}{a_l} +$$ + +where $a_u$ and $a_l$ the upper and lower asymptotes. +In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. + +##### SSMD + +The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: + +$$ +SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} +$$ + +In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. + +##### Curve-repsonse-score (CRS) + +$$CRS= +\begin{cases} +\frac{fcScore+vScore+zScore}{3}*100,\\ +0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 +\end{cases}$$ + +with + +$$fcScore= +\begin{cases} +1 \quad for \quad |log_2FC| > log_2FC_{max}\\ +\frac{|log_2FC|}{log_2FC} +\end{cases}$$ + +and + +$$vScore=V'_{mod.}$$ + +and + +$$zScore= +\begin{cases} +1 \quad for \quad Z'_{mod.}>0.5\\ +\frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 +\end{cases}$$ + +The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. + +### Metrics subtab -The metrics screen enables to visualize different metrics (Z', V', SSMD, logFC, CRS as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. +The metrics screen enables to visualize different metrics (*Z'*, *V'*, *SSMD*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. -### QC tab +## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. -The lower left part shows different metrics (both assay quality metrics like Z', V', CRS and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** +The lower left part shows different metrics (both assay quality metrics like *Z'*, *V'*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** The lower right shows processing (and in case of Bruker data also some measurement meta data) as a summary. @@ -149,7 +241,7 @@ The lower right shows processing (and in case of Bruker data also some measureme knitr::include_graphics("figures/qc.png") ``` -### PCA tab +## PCA tab A PCA (Principle component analysis) enables a multivariate view to the data by dimensional reduction. Although, on its own its hard to identify biomarkers/regulated signals with it, the PCA is highly useful to judge the general concentration-dependent differences introduced by the treatment. A high separation of the different concentrations shows that some multivariate effects are in place were-as a low separation hints at either low effects overall or effects that are unique to some single (and most likely rather small) peaks. This is why the PCA can be a nice addition to the univariate analysis featured on the [Curves](#curve-screen)-screen The PCA can be generated by clicking on the `Perform PCA`-Button. @@ -165,9 +257,9 @@ The loading's can used to identify peaks that have a high influence to the score knitr::include_graphics("figures/pca_loadings.png") ``` -Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in Z', V' or CRS) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. +Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in *Z'*, *V'* or *CRS*) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. -### Cluster tab +## Cluster tab The cluster tab enables to cluster curves based on their shape to enable to detect signals of interest that follow a similar direction as one (or many) target signals. @@ -183,7 +275,7 @@ Using the slider the user needs to adjust the number of clusters to a reasonable knitr::include_graphics("figures/clustering_metrics.png") ``` -### Settings tab +## Settings tab ```{r, echo=FALSE, out.width="33%", fig.cap="Settings tab"} knitr::include_graphics("figures/settings.png") @@ -197,7 +289,7 @@ The `Peak window size` and `Peak method` setting enables to change the peak dete The `Exclude empty spectra` setting will exclude spectra that don't contain any signals. -#### Saving processing parameters +### Saving processing parameters To save results for a later usage the app includes the option to save all relevant processing parameters. This can be done by clicking: `Settings` -\> `Save settings`. If also the path to the data should be saved this needs to be after setting the directory but before loading the spectra. @@ -207,7 +299,7 @@ If such a file is found at the start-up of the app, the parameters will be loade As processing is typically fast, this is a more efficient (time & disk-space) process then to save the complete app-state including spectra and calculated values. -### Save fitting parameters +## Save fitting parameters The curve fitting in the app is internally performed by the [nplr-package](https://github.com/fredcommo/nplr) that used the Richardson Formula for Logistic regression: diff --git a/manual.md b/manual.md index b1c5de4..e3ebe68 100644 --- a/manual.md +++ b/manual.md @@ -2,7 +2,9 @@ title: "M2ara Manual" author: "Thomas Enzlein" date: "09.08.2024" -output: html_document +output: + html_document: + mathjax: "//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --- @@ -25,7 +27,7 @@ The following features were already part of `MALDIcellassay`: - graphical user interface - interactive data exploration - support for [mzML](#mzml) data \* -- calculation of quality metrics (Z', V', log2FC, CRS) \* +- calculation of quality metrics (*Z'*, *V'*, *log2FC*, *CRS*) \* - feature ranking by metric \* - principle component analysis (PCA) - curve clustering @@ -33,15 +35,23 @@ The following features were already part of `MALDIcellassay`: \* The features marked with a asterisks were re-implemented to `MALDIcellassay`. -## General information +# General information The blue question mark icons () throughout the application can be clicked and provide further information on the specific settings. -## Requirements to the raw data {#requirements-to-the-raw-data} +# Requirements to the raw data {#requirements-to-the-raw-data} -For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: 1. as filename (see below) 2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. Concentration mapping file The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). +For best results, a concentration curve should consist of at least 7 points (better 9 points). +To calculate all necessary [Scores](#scores) there should be at least two replicates (better 4) per concentration. -### Bruker Flex format (\*.fid) {#bruker} +For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: +1. as filename (see below) +2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. + +## Concentration mapping file +The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). + +## Bruker Flex format (\*.fid) {#bruker} This application supports Bruker flex raw data as generated by instruments of the Bruker-Flex series (e.g. RapiFleX, UltraFleX, AutoFleX). At the moment there is no support for timsTOF or SolariX data directly but import via [mzML](#mzml) is possible. @@ -65,7 +75,7 @@ etc. Briefly: Each spectrum has to reside in a folder which is named according to the concentration used to treat the cells in the respective sample. The number of measurement replicates per concentration is unlimited (should typically be at least four to compensate for artifacts from e.g. matrix heterogeneity or preparation). -### mzML {#mzml} +## mzML {#mzml} For mzML-import the mzML files need to be named with the corresponding concentration used for treatment. Please put all technical replicates for a given concentration into the same mzML file. @@ -77,7 +87,7 @@ For mzML-import the mzML files need to be named with the corresponding concentra etc. ``` -## Step-by-step {#curve-screen} +# Step-by-step {#curve-screen}
Interface of the app @@ -95,7 +105,7 @@ etc. 5. If you want to save the curve fit and peak profile of a given *m/z*-value you can click the download button below the peak table to save your results as \*.csv. -## Analysis pipeline +# Analysis pipeline The analysis pipeline consist of the following steps (see figure below for a graphical overview): @@ -110,7 +120,7 @@ The analysis pipeline consist of the following steps (see figure below for a gra 9. `Intensity matrix`: The peaks of the average spectra are transformed into a matrix with columns representing *m/z* values and rows representing concentrations whereas cells contain the respective intensity. 10. `Varience filtering` is applied. 11. `Curve fitting` is performed. -12. `Quality metrics` are calculated (V', Z', SSMD, Log2FC, CRS). +12. `Quality metrics` are calculated (*V'*, *Z'*, *SSMD*, *Log2FC*, *CRS*). 13. The peaks can be selected in the `Peak table`. 14. The respective dose-response curve as well as the peak profile is visualized and might be saved. @@ -119,11 +129,11 @@ The analysis pipeline consist of the following steps (see figure below for a gra

Schematic outline of the analysis workflow

-## Individual screens +# Individual screens -### Main Tab +## Main Tab -#### Curve subtab +### Curve subtab The main [Curve](#curve-screen) screen is intended for a univariate analysis in a peak-by-peak manner. @@ -133,15 +143,97 @@ The upper left show's a zoom-in to the corresponding individual peaks. The level Below the two plots the peak table is shown. Here all found signals as well as all metrics are displayed. The two upper plots will change if a signal is selected. -#### Metrics subtab +#### Scores + +**M²ara** comes with a variety of helpful scores/metrics that are meant to help judging the quality of response curves. + +##### Modified Z': + +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z’* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by + +$$ +Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} +$$ + +is implemented into **M²ara**. The modified *Z'* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. + +##### Modified V': + +The modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by + +$$ +V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} +$$ + +with + +$$ +\sigma_f=\sqrt{\frac{1}{N}\sum(f_{exp}-f)^2} +$$ + +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V'* factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: *V'* focuses more on the goodness of fit of the curve to the data points. + +##### Log2-Fold-Change + +The $Log_2FC$ denotes the magnitude i.e. effect size of a response. It is defined as: + +$$ +Log_2 FC =log_2\frac{a_u}{a_l} +$$ + +where $a_u$ and $a_l$ the upper and lower asymptotes. +In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. + +##### SSMD + +The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: + +$$ +SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} +$$ + +In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. + +##### Curve-repsonse-score (CRS) + +$$CRS= +\begin{cases} +\frac{fcScore+vScore+zScore}{3}*100,\\ +0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 +\end{cases}$$ + +with + +$$fcScore= +\begin{cases} +1 \quad for \quad |log_2FC| > log_2FC_{max}\\ +\frac{|log_2FC|}{log_2FC} +\end{cases}$$ + +and + +$$vScore=V'_{mod.}$$ + +and + +$$zScore= +\begin{cases} +1 \quad for \quad Z'_{mod.}>0.5\\ +\frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 +\end{cases}$$ + +The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. + +### Metrics subtab -The metrics screen enables to visualize different metrics (Z', V', SSMD, logFC, CRS as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. +The metrics screen enables to visualize different metrics (*Z'*, *V'*, *SSMD*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. -### QC tab +## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. -The lower left part shows different metrics (both assay quality metrics like Z', V', CRS and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** +The lower left part shows different metrics (both assay quality metrics like *Z'*, *V'*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** The lower right shows processing (and in case of Bruker data also some measurement meta data) as a summary. @@ -150,7 +242,7 @@ The lower right shows processing (and in case of Bruker data also some measureme

Example of QC-tab

-### PCA tab +## PCA tab A PCA (Principle component analysis) enables a multivariate view to the data by dimensional reduction. Although, on its own its hard to identify biomarkers/regulated signals with it, the PCA is highly useful to judge the general concentration-dependent differences introduced by the treatment. A high separation of the different concentrations shows that some multivariate effects are in place were-as a low separation hints at either low effects overall or effects that are unique to some single (and most likely rather small) peaks. This is why the PCA can be a nice addition to the univariate analysis featured on the [Curves](#curve-screen)-screen The PCA can be generated by clicking on the `Perform PCA`-Button. @@ -168,9 +260,9 @@ The loading's can used to identify peaks that have a high influence to the score

PCA loadings plot

-Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in Z', V' or CRS) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. +Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in *Z'*, *V'* or *CRS*) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. -### Cluster tab +## Cluster tab The cluster tab enables to cluster curves based on their shape to enable to detect signals of interest that follow a similar direction as one (or many) target signals. @@ -188,7 +280,7 @@ Using the slider the user needs to adjust the number of clusters to a reasonable

Clustering metrics

-### Settings tab +## Settings tab
Settings tab @@ -203,7 +295,7 @@ The `Peak window size` and `Peak method` setting enables to change the peak dete The `Exclude empty spectra` setting will exclude spectra that don't contain any signals. -#### Saving processing parameters +### Saving processing parameters To save results for a later usage the app includes the option to save all relevant processing parameters. This can be done by clicking: `Settings` -\> `Save settings`. If also the path to the data should be saved this needs to be after setting the directory but before loading the spectra. @@ -213,7 +305,7 @@ If such a file is found at the start-up of the app, the parameters will be loade As processing is typically fast, this is a more efficient (time & disk-space) process then to save the complete app-state including spectra and calculated values. -### Save fitting parameters +## Save fitting parameters The curve fitting in the app is internally performed by the [nplr-package](https://github.com/fredcommo/nplr) that used the Richardson Formula for Logistic regression: diff --git a/tests/testthat/settings_mzML_data.csv b/tests/testthat/settings_mzML_data.csv index 1a24aa4..aef310e 100644 --- a/tests/testthat/settings_mzML_data.csv +++ b/tests/testthat/settings_mzML_data.csv @@ -1,2 +1,2 @@ "concUnits","avgMethod","normMeth","VarFilterMethod","errorbars","metric","plateStat","pcaX","pcaY","pcaEllipse","fileFormat","peakMethod","zoom","pcaAlpha","pcaBeta","num_cluster","SinglePointRecal","plateScale","simpleLoadings","checkEmpty","SNR","normMz","normTol","alignTol","binTol","halfWindowSize","smooth","rmBl","sqrtTrans","monoisotopicFilter","dir" -"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,TRUE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,20,TRUE,TRUE,FALSE,FALSE,"mzML" +"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,FALSE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,3,FALSE,TRUE,FALSE,FALSE,"mzML"