From 2bf42506f1b7572443dcebe357f71b6f65557389 Mon Sep 17 00:00:00 2001 From: thomas-enzlein Date: Thu, 15 Aug 2024 09:55:07 +0200 Subject: [PATCH 01/13] - update manual with scores --- app.R | 22 +++---- components/mainpanel.R | 2 +- manual.Rmd | 124 +++++++++++++++++++++++++++++++++------ manual.md | 129 ++++++++++++++++++++++++++++++++++------- 4 files changed, 227 insertions(+), 50 deletions(-) diff --git a/app.R b/app.R index e7f4a4b..8611d72 100644 --- a/app.R +++ b/app.R @@ -1,11 +1,11 @@ -# check if all required packages are installed -source("functions/checkInstalledPackages.R") -checkInstalledPackages(req_file = "req.txt") - -knit("manual.Rmd", quiet = TRUE) - -source("components/ui.R") -source("components/server.R") - -# Run the application -shinyApp(ui = ui, server = server) +# check if all required packages are installed +source("functions/checkInstalledPackages.R") +checkInstalledPackages(req_file = "req.txt") + +knit("manual.Rmd", quiet = TRUE) + +source("components/ui.R") +source("components/server.R") + +# Run the application +shinyApp(ui = ui, server = server) diff --git a/components/mainpanel.R b/components/mainpanel.R index 912a94b..48c94c7 100644 --- a/components/mainpanel.R +++ b/components/mainpanel.R @@ -67,7 +67,7 @@ appMainPanel <- function(defaults) { ) ), #### Manual tab #### - tabPanel("Manual", htmltools::includeMarkdown("manual.md") + tabPanel("Manual", withMathJax(htmltools::includeMarkdown("manual.md")) ) ) ) diff --git a/manual.Rmd b/manual.Rmd index 8b716dd..3eba47d 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -2,7 +2,10 @@ title: "M2ara Manual" author: "Thomas Enzlein" date: "09.08.2024" -output: html_document +output: + html_document: + mathjax: "//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" +runtime: shiny --- ```{r setup, include=FALSE} @@ -35,15 +38,23 @@ The following features were already part of `MALDIcellassay`: \* The features marked with a asterisks were re-implemented to `MALDIcellassay`. -## General information +# General information The blue question mark icons (`r icon("question-circle")`) throughout the application can be clicked and provide further information on the specific settings. -## Requirements to the raw data {#requirements-to-the-raw-data} +# Requirements to the raw data {#requirements-to-the-raw-data} -For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: 1. as filename (see below) 2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. Concentration mapping file The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). +For best results, a concentration curve should consist of at least 7 points (better 9 points). +To calculate all necessary [Scores](#scores) there should be at least two replicates (better 4) per concentration. -### Bruker Flex format (\*.fid) {#bruker} +For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: +1. as filename (see below) +2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. + +## Concentration mapping file +The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). + +## Bruker Flex format (\*.fid) {#bruker} This application supports Bruker flex raw data as generated by instruments of the Bruker-Flex series (e.g. RapiFleX, UltraFleX, AutoFleX). At the moment there is no support for timsTOF or SolariX data directly but import via [mzML](#mzml) is possible. @@ -67,7 +78,7 @@ etc. Briefly: Each spectrum has to reside in a folder which is named according to the concentration used to treat the cells in the respective sample. The number of measurement replicates per concentration is unlimited (should typically be at least four to compensate for artifacts from e.g. matrix heterogeneity or preparation). -### mzML {#mzml} +## mzML {#mzml} For mzML-import the mzML files need to be named with the corresponding concentration used for treatment. Please put all technical replicates for a given concentration into the same mzML file. @@ -79,7 +90,7 @@ For mzML-import the mzML files need to be named with the corresponding concentra etc. ``` -## Step-by-step {#curve-screen} +# Step-by-step {#curve-screen} ```{r, echo=FALSE, out.width="66%", fig.cap="Interface of the app"} knitr::include_graphics("figures/interface.png") @@ -96,7 +107,7 @@ knitr::include_graphics("figures/interface.png") 5. If you want to save the curve fit and peak profile of a given *m/z*-value you can click the download button below the peak table to save your results as \*.csv. -## Analysis pipeline +# Analysis pipeline The analysis pipeline consist of the following steps (see figure below for a graphical overview): @@ -119,11 +130,11 @@ The analysis pipeline consist of the following steps (see figure below for a gra knitr::include_graphics("figures/pipeline.png") ``` -## Individual screens +# Individual screens -### Main Tab +## Main Tab -#### Curve subtab +### Curve subtab The main [Curve](#curve-screen) screen is intended for a univariate analysis in a peak-by-peak manner. @@ -133,11 +144,88 @@ The upper left show's a zoom-in to the corresponding individual peaks. The level Below the two plots the peak table is shown. Here all found signals as well as all metrics are displayed. The two upper plots will change if a signal is selected. -#### Metrics subtab +#### Scores + +**M²ara** comes with a variety of helpful scores/metrics that are meant to help judging the quality of response curves. + +##### Modified Z': + +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., Iversen et al., Ravkin et al.] . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the Z’ factor (Zhang, Chung and Oldenburg 1999), defined by $$ +Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{| \mu_u-\mu_l |} +$$ is implemented into **M²ara**. The mod. Z' score helps to make a judgment about the distance of the means ($\mu$, more is better) and standard deviation ($\sigma$, less is better) of the upper ($_u$) and lower ($l$) end of the curve. + +##### Modified V': + +The modified *V*’ [Ravkin et al.] is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by +$$ +V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} +$$ + +with + +$$ +\sigma_f=\sqrt{\frac{1}{N}\sum(f_exp-f)^2} +$$ + +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V*’ factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: V' focuses more on the goodness of fit of the curve to the data points. + +##### Log2-Fold-Change + +The $Log_2FC$ denotes the magnitude i.e. effect size of a response. It is defined as: + +$$ +Log_2 FC =log_2\frac{a_u}{a_l} +$$ +where $a_u$ and $a_l$ the upper and lower asymptotes. +In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. + +##### SSMD + +The Strictly Standardized Mean Difference (SSMD), is implemented (Bray and Carpenter 2004; Zhang 2007), with: + +$$ +SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} +$$ +In short: The SSMD gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. + +##### Curve-repsonse-score (CRS) + +$$ +CRS= +\begin{cases} +\frac{fcScore+vScore+zScore}{3}*100,\\ +0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 +\end{cases} +$$ +with + +$$ +fcScore= +\begin{cases} +1 \quad for \quad |log_2FC| > log_2FC_{max}\\ +\frac{|log_2FC|}{log_2FC} +\end{cases} +$$ +and +$$ +vScore=V'_{mod.} +$$ +and +$$ +zScore= +\begin{cases} +1 \quad for \quad Z'_{mod.}>0.5\\ +\frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 +\end{cases} +$$ +The CRS combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}$=2.59. The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the Z’ factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$. Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$. The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. + +### Metrics subtab The metrics screen enables to visualize different metrics (Z', V', SSMD, logFC, CRS as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. -### QC tab +## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. @@ -149,7 +237,7 @@ The lower right shows processing (and in case of Bruker data also some measureme knitr::include_graphics("figures/qc.png") ``` -### PCA tab +## PCA tab A PCA (Principle component analysis) enables a multivariate view to the data by dimensional reduction. Although, on its own its hard to identify biomarkers/regulated signals with it, the PCA is highly useful to judge the general concentration-dependent differences introduced by the treatment. A high separation of the different concentrations shows that some multivariate effects are in place were-as a low separation hints at either low effects overall or effects that are unique to some single (and most likely rather small) peaks. This is why the PCA can be a nice addition to the univariate analysis featured on the [Curves](#curve-screen)-screen The PCA can be generated by clicking on the `Perform PCA`-Button. @@ -167,7 +255,7 @@ knitr::include_graphics("figures/pca_loadings.png") Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in Z', V' or CRS) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. -### Cluster tab +## Cluster tab The cluster tab enables to cluster curves based on their shape to enable to detect signals of interest that follow a similar direction as one (or many) target signals. @@ -183,7 +271,7 @@ Using the slider the user needs to adjust the number of clusters to a reasonable knitr::include_graphics("figures/clustering_metrics.png") ``` -### Settings tab +## Settings tab ```{r, echo=FALSE, out.width="33%", fig.cap="Settings tab"} knitr::include_graphics("figures/settings.png") @@ -197,7 +285,7 @@ The `Peak window size` and `Peak method` setting enables to change the peak dete The `Exclude empty spectra` setting will exclude spectra that don't contain any signals. -#### Saving processing parameters +### Saving processing parameters To save results for a later usage the app includes the option to save all relevant processing parameters. This can be done by clicking: `Settings` -\> `Save settings`. If also the path to the data should be saved this needs to be after setting the directory but before loading the spectra. @@ -207,7 +295,7 @@ If such a file is found at the start-up of the app, the parameters will be loade As processing is typically fast, this is a more efficient (time & disk-space) process then to save the complete app-state including spectra and calculated values. -### Save fitting parameters +## Save fitting parameters The curve fitting in the app is internally performed by the [nplr-package](https://github.com/fredcommo/nplr) that used the Richardson Formula for Logistic regression: diff --git a/manual.md b/manual.md index df46fa4..7273b5e 100644 --- a/manual.md +++ b/manual.md @@ -2,7 +2,10 @@ title: "M2ara Manual" author: "Thomas Enzlein" date: "09.08.2024" -output: html_document +output: + html_document: + mathjax: "//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" +runtime: shiny --- @@ -16,8 +19,8 @@ This manual describes some of the details of the inner-workings of the `MALDIcel The following features were already part of `MALDIcellassay`: - pre-processing though the `MALDIquant`-package -- re-calibration to a single **m/z** (single point-re-calibration) -- normalization to a single **m/z** +- re-calibration to a single *m/z* (single point-re-calibration) +- normalization to a single *m/z* - fitting curves to the data using the `nplr`-package **M2ara** adds the following features: @@ -26,21 +29,30 @@ The following features were already part of `MALDIcellassay`: - interactive data exploration - support for [mzML](#mzml) data \* - calculation of quality metrics (Z', V', log2FC, CRS) \* +- feature ranking by metric \* - principle component analysis (PCA) - curve clustering - outlier detection \* \* The features marked with a asterisks were re-implemented to `MALDIcellassay`. -## General information +# General information The blue question mark icons () throughout the application can be clicked and provide further information on the specific settings. -## Requirements to the raw data {#requirements-to-the-raw-data} +# Requirements to the raw data {#requirements-to-the-raw-data} -For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: 1. as filename (see below) 2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. Concentration mapping file The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). +For best results, a concentration curve should consist of at least 7 points (better 9 points). +To calculate all necessary [Scores](#scores) there should be at least two replicates (better 4) per concentration. -### Bruker Flex format (\*.fid) {#bruker} +For the curve fitting to work all spectra must have an associated concentration. This concentration can be supplied in two ways: +1. as filename (see below) +2. as mapping file (\*.txt) containing the concentrations of the spectra in the right order one per line. + +## Concentration mapping file +The mapping file can be uploaded before loading the spectra using Settings -\> Conc. mapping. This method can be used for [Bruker](#bruker) and [mzML](#mzml). As said the concentrations need to be in the right order, one concentration per line and the number of concentrations must match the number of spectra. Also please don't use any units or other characters (which cant be converted to numbers). + +## Bruker Flex format (\*.fid) {#bruker} This application supports Bruker flex raw data as generated by instruments of the Bruker-Flex series (e.g. RapiFleX, UltraFleX, AutoFleX). At the moment there is no support for timsTOF or SolariX data directly but import via [mzML](#mzml) is possible. @@ -64,7 +76,7 @@ etc. Briefly: Each spectrum has to reside in a folder which is named according to the concentration used to treat the cells in the respective sample. The number of measurement replicates per concentration is unlimited (should typically be at least four to compensate for artifacts from e.g. matrix heterogeneity or preparation). -### mzML {#mzml} +## mzML {#mzml} For mzML-import the mzML files need to be named with the corresponding concentration used for treatment. Please put all technical replicates for a given concentration into the same mzML file. @@ -76,7 +88,7 @@ For mzML-import the mzML files need to be named with the corresponding concentra etc. ``` -## Step-by-step {#curve-screen} +# Step-by-step {#curve-screen}
Interface of the app @@ -94,7 +106,7 @@ etc. 5. If you want to save the curve fit and peak profile of a given *m/z*-value you can click the download button below the peak table to save your results as \*.csv. -## Analysis pipeline +# Analysis pipeline The analysis pipeline consist of the following steps (see figure below for a graphical overview): @@ -118,11 +130,11 @@ The analysis pipeline consist of the following steps (see figure below for a gra

Schematic outline of the analysis workflow

-## Individual screens +# Individual screens -### Main Tab +## Main Tab -#### Curve subtab +### Curve subtab The main [Curve](#curve-screen) screen is intended for a univariate analysis in a peak-by-peak manner. @@ -132,11 +144,88 @@ The upper left show's a zoom-in to the corresponding individual peaks. The level Below the two plots the peak table is shown. Here all found signals as well as all metrics are displayed. The two upper plots will change if a signal is selected. -#### Metrics subtab +#### Scores + +**M²ara** comes with a variety of helpful scores/metrics that are meant to help judging the quality of response curves. + +##### Modified Z': + +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., Iversen et al., Ravkin et al.] . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the Z’ factor (Zhang, Chung and Oldenburg 1999), defined by $$ +Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{| \mu_u-\mu_l |} +$$ is implemented into **M²ara**. The mod. Z' score helps to make a judgment about the distance of the means ($\mu$, more is better) and standard deviation ($\sigma$, less is better) of the upper ($_u$) and lower ($l$) end of the curve. + +##### Modified V': + +The modified *V*’ [Ravkin et al.] is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by +$$ +V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} +$$ + +with + +$$ +\sigma_f=\sqrt{\frac{1}{N}\sum(f_exp-f)^2} +$$ + +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V*’ factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: V' focuses more on the goodness of fit of the curve to the data points. + +##### Log2-Fold-Change + +The $Log_2FC$ denotes the magnitude i.e. effect size of a response. It is defined as: + +$$ +Log_2 FC =log_2\frac{a_u}{a_l} +$$ +where $a_u$ and $a_l$ the upper and lower asymptotes. +In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. + +##### SSMD + +The Strictly Standardized Mean Difference (SSMD), is implemented (Bray and Carpenter 2004; Zhang 2007), with: + +$$ +SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} +$$ +In short: The SSMD gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. + +##### Curve-repsonse-score (CRS) + +$$ +CRS= +\begin{cases} +\frac{fcScore+vScore+zScore}{3}*100,\\ +0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 +\end{cases} +$$ +with + +$$ +fcScore= +\begin{cases} +1 \quad for \quad |log_2FC| > log_2FC_{max}\\ +\frac{|log_2FC|}{log_2FC} +\end{cases} +$$ +and +$$ +vScore=V'_{mod.} +$$ +and +$$ +zScore= +\begin{cases} +1 \quad for \quad Z'_{mod.}>0.5\\ +\frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 +\end{cases} +$$ +The CRS combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}$=2.59. The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the Z’ factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$. Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$. The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. + +### Metrics subtab The metrics screen enables to visualize different metrics (Z', V', SSMD, logFC, CRS as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. -### QC tab +## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. @@ -149,7 +238,7 @@ The lower right shows processing (and in case of Bruker data also some measureme

Example of QC-tab

-### PCA tab +## PCA tab A PCA (Principle component analysis) enables a multivariate view to the data by dimensional reduction. Although, on its own its hard to identify biomarkers/regulated signals with it, the PCA is highly useful to judge the general concentration-dependent differences introduced by the treatment. A high separation of the different concentrations shows that some multivariate effects are in place were-as a low separation hints at either low effects overall or effects that are unique to some single (and most likely rather small) peaks. This is why the PCA can be a nice addition to the univariate analysis featured on the [Curves](#curve-screen)-screen The PCA can be generated by clicking on the `Perform PCA`-Button. @@ -169,7 +258,7 @@ The loading's can used to identify peaks that have a high influence to the score Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in Z', V' or CRS) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. -### Cluster tab +## Cluster tab The cluster tab enables to cluster curves based on their shape to enable to detect signals of interest that follow a similar direction as one (or many) target signals. @@ -187,7 +276,7 @@ Using the slider the user needs to adjust the number of clusters to a reasonable

Clustering metrics

-### Settings tab +## Settings tab
Settings tab @@ -202,7 +291,7 @@ The `Peak window size` and `Peak method` setting enables to change the peak dete The `Exclude empty spectra` setting will exclude spectra that don't contain any signals. -#### Saving processing parameters +### Saving processing parameters To save results for a later usage the app includes the option to save all relevant processing parameters. This can be done by clicking: `Settings` -\> `Save settings`. If also the path to the data should be saved this needs to be after setting the directory but before loading the spectra. @@ -212,7 +301,7 @@ If such a file is found at the start-up of the app, the parameters will be loade As processing is typically fast, this is a more efficient (time & disk-space) process then to save the complete app-state including spectra and calculated values. -### Save fitting parameters +## Save fitting parameters The curve fitting in the app is internally performed by the [nplr-package](https://github.com/fredcommo/nplr) that used the Richardson Formula for Logistic regression: From 6bf06d952ef9b8ceaf46392972792673f7438fe3 Mon Sep 17 00:00:00 2001 From: thomas-enzlein Date: Thu, 15 Aug 2024 09:57:54 +0200 Subject: [PATCH 02/13] - bump version --- DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index eaee3ef..6c12af2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: M2ara Title: A shiny GUI to explore concentration-responses in MALDI-based assays -Version: 1.4.0 +Version: 1.4.1 Authors@R: person("Thomas", "Enzlein", , "t.enzlein@hs-mannheim.de", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-1789-4090")) From 8409d82942c7967d36659c395b16183282f606f6 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Fri, 16 Aug 2024 11:35:45 +0200 Subject: [PATCH 03/13] Update settings_mzML_data.csv --- tests/testthat/settings_mzML_data.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/testthat/settings_mzML_data.csv b/tests/testthat/settings_mzML_data.csv index 1a24aa4..225f7e6 100644 --- a/tests/testthat/settings_mzML_data.csv +++ b/tests/testthat/settings_mzML_data.csv @@ -1,2 +1,2 @@ "concUnits","avgMethod","normMeth","VarFilterMethod","errorbars","metric","plateStat","pcaX","pcaY","pcaEllipse","fileFormat","peakMethod","zoom","pcaAlpha","pcaBeta","num_cluster","SinglePointRecal","plateScale","simpleLoadings","checkEmpty","SNR","normMz","normTol","alignTol","binTol","halfWindowSize","smooth","rmBl","sqrtTrans","monoisotopicFilter","dir" -"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,TRUE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,20,TRUE,TRUE,FALSE,FALSE,"mzML" +"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,TRUE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,3,TRUE,TRUE,FALSE,FALSE,"mzML" From 50d86948e368cbe490f024bb491a28118b2c2387 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Fri, 16 Aug 2024 11:36:12 +0200 Subject: [PATCH 04/13] Update settings_mzML_data.csv --- tests/testthat/settings_mzML_data.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/testthat/settings_mzML_data.csv b/tests/testthat/settings_mzML_data.csv index 225f7e6..97c352a 100644 --- a/tests/testthat/settings_mzML_data.csv +++ b/tests/testthat/settings_mzML_data.csv @@ -1,2 +1,2 @@ "concUnits","avgMethod","normMeth","VarFilterMethod","errorbars","metric","plateStat","pcaX","pcaY","pcaEllipse","fileFormat","peakMethod","zoom","pcaAlpha","pcaBeta","num_cluster","SinglePointRecal","plateScale","simpleLoadings","checkEmpty","SNR","normMz","normTol","alignTol","binTol","halfWindowSize","smooth","rmBl","sqrtTrans","monoisotopicFilter","dir" -"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,TRUE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,3,TRUE,TRUE,FALSE,FALSE,"mzML" +"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,FALSE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,3,TRUE,TRUE,FALSE,FALSE,"mzML" From 1c17b9df921eba5aea40ac6f960b005742b44501 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Fri, 16 Aug 2024 11:36:32 +0200 Subject: [PATCH 05/13] Update settings_mzML_data.csv --- tests/testthat/settings_mzML_data.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/testthat/settings_mzML_data.csv b/tests/testthat/settings_mzML_data.csv index 97c352a..aef310e 100644 --- a/tests/testthat/settings_mzML_data.csv +++ b/tests/testthat/settings_mzML_data.csv @@ -1,2 +1,2 @@ "concUnits","avgMethod","normMeth","VarFilterMethod","errorbars","metric","plateStat","pcaX","pcaY","pcaEllipse","fileFormat","peakMethod","zoom","pcaAlpha","pcaBeta","num_cluster","SinglePointRecal","plateScale","simpleLoadings","checkEmpty","SNR","normMz","normTol","alignTol","binTol","halfWindowSize","smooth","rmBl","sqrtTrans","monoisotopicFilter","dir" -"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,FALSE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,3,TRUE,TRUE,FALSE,FALSE,"mzML" +"uM","mean","mz","none","none","CRS","Recal-shift","PC1","PC2","0.67","mzml","SuperSmoother",4,-3,-3,4,FALSE,FALSE,TRUE,TRUE,3,354.1418,0.1,0,100,3,FALSE,TRUE,FALSE,FALSE,"mzML" From 8a186b5df76b39c7ed2fb64f067ccb027114520b Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Fri, 16 Aug 2024 11:41:05 +0200 Subject: [PATCH 06/13] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 280da67..28a9266 100644 --- a/README.md +++ b/README.md @@ -75,6 +75,8 @@ To replicate the results shown use the following parameters: - set binning tolerance to 100 ppm - select the folder `mzML` (parent folder of the mzML files) from the .zip file, please make sure that no other files are in this folder. +Alternatively, copy the [this file](https://github.com/CeMOS-Mannheim/M2ara/blob/main/tests/testthat/settings_mzML_data.csv) as `settings.csv` into the main folder of the app. + The target *m/z* is 349.11 (E3S, [M-H]-) the pIC50 value should be 6.1. #### Weigt2018_BCR-ABL_inhibition_Dasatinib_BrukerFlex.zip @@ -95,5 +97,7 @@ To replicate the results shown use the following parameters: - set binning tolerance to 100 ppm - select the the folder `curve` from the .zip file, make sure no other files/folders are present. +Alternatively, copy the [this file](https://github.com/CeMOS-Mannheim/M2ara/blob/main/tests/testthat/settings_bruker_data.csv) as `settings.csv` into the main folder of the app. + The target is *m/z* 826.5722 (PC(36:1) [M+K]+) and *m/z* 616.1767 (Heme B [M+H]+) the pIC50 values should be 9.5 and 9.7. From 75fcab73740c5af8c81d03d4ac02d9eaa258d8d6 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Fri, 16 Aug 2024 12:01:39 +0200 Subject: [PATCH 07/13] Update manual.Rmd --- manual.Rmd | 1 - 1 file changed, 1 deletion(-) diff --git a/manual.Rmd b/manual.Rmd index 3eba47d..de8437b 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -5,7 +5,6 @@ date: "09.08.2024" output: html_document: mathjax: "//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -runtime: shiny --- ```{r setup, include=FALSE} From c82fa3ef4b5999c785ae209a8703f455c2ea79d2 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Fri, 16 Aug 2024 15:22:56 +0200 Subject: [PATCH 08/13] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 28a9266..d1956ac 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ docker run -p 3838:3838 -v c:/path/to/massSpecData:/mnt thomasenzlein/m2ara:mai ### Stand-alone installer for Windows Use the stand-alone installer (Windows only, no R installation needed). -The installer can be downloaded [here](https://github.com/CeMOS-Mannheim/M2ara/releases/download/1.2/MALDIcellassay_1.2.exe). +The installer can be downloaded [here](https://github.com/CeMOS-Mannheim/M2ara/releases/download/1.4.1/M2ara_1.4.1.exe). ## Example data To test the app please use the example data on [FigShare](https://dx.doi.org/10.6084/m9.figshare.25736541). From 068e145257e00da47950d140f84a449d66057798 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Sat, 17 Aug 2024 09:39:01 +0200 Subject: [PATCH 09/13] Update manual.Rmd --- manual.Rmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/manual.Rmd b/manual.Rmd index de8437b..073c968 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -156,6 +156,7 @@ $$ is implemented into **M²ara**. The mod. Z' score helps to make a judgment ab ##### Modified V': The modified *V*’ [Ravkin et al.] is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by + $$ V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} $$ @@ -163,7 +164,7 @@ $$ with $$ -\sigma_f=\sqrt{\frac{1}{N}\sum(f_exp-f)^2} +\sigma_f=\sqrt{\frac{1}{N}\sum(f_{exp}-f)^2} $$ where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V*’ factor reflects the goodness of the fit and thus the variance within all data points described by the model. From c62293a85e5207a4b2967a121237ac64bde5e618 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein Date: Sat, 17 Aug 2024 10:37:03 +0200 Subject: [PATCH 10/13] fixes for inline latex in manual --- manual.Rmd | 32 ++++++++++++++++++-------------- manual.md | 36 ++++++++++++++++++++---------------- 2 files changed, 38 insertions(+), 30 deletions(-) diff --git a/manual.Rmd b/manual.Rmd index 073c968..5dbaba5 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -29,7 +29,7 @@ The following features were already part of `MALDIcellassay`: - graphical user interface - interactive data exploration - support for [mzML](#mzml) data \* -- calculation of quality metrics (Z', V', log2FC, CRS) \* +- calculation of quality metrics (*Z'*, *V'*, *log2FC*, *CRS*) \* - feature ranking by metric \* - principle component analysis (PCA) - curve clustering @@ -121,7 +121,7 @@ The analysis pipeline consist of the following steps (see figure below for a gra 9. `Intensity matrix`: The peaks of the average spectra are transformed into a matrix with columns representing *m/z* values and rows representing concentrations whereas cells contain the respective intensity. 10. `Varience filtering` is applied. 11. `Curve fitting` is performed. -12. `Quality metrics` are calculated (V', Z', SSMD, Log2FC, CRS). +12. `Quality metrics` are calculated (*V'*, *Z'*, *SSMD*, *Log2FC*, *CRS*). 13. The peaks can be selected in the `Peak table`. 14. The respective dose-response curve as well as the peak profile is visualized and might be saved. @@ -149,13 +149,17 @@ Below the two plots the peak table is shown. Here all found signals as well as a ##### Modified Z': -In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., Iversen et al., Ravkin et al.] . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the Z’ factor (Zhang, Chung and Oldenburg 1999), defined by $$ -Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{| \mu_u-\mu_l |} -$$ is implemented into **M²ara**. The mod. Z' score helps to make a judgment about the distance of the means ($\mu$, more is better) and standard deviation ($\sigma$, less is better) of the upper ($_u$) and lower ($l$) end of the curve. +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z’* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by + +$$ +Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} +$$ + +is implemented into **M²ara**. The modified *Z'* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. ##### Modified V': -The modified *V*’ [Ravkin et al.] is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by +The modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by $$ V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} @@ -167,8 +171,8 @@ $$ \sigma_f=\sqrt{\frac{1}{N}\sum(f_{exp}-f)^2} $$ -where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V*’ factor reflects the goodness of the fit and thus the variance within all data points described by the model. -In short: V' focuses more on the goodness of fit of the curve to the data points. +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V'* factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: *V'* focuses more on the goodness of fit of the curve to the data points. ##### Log2-Fold-Change @@ -182,12 +186,12 @@ In short: The $Log_2FC$ gives the raw (no variation of data points considered) d ##### SSMD -The Strictly Standardized Mean Difference (SSMD), is implemented (Bray and Carpenter 2004; Zhang 2007), with: +The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: $$ SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} $$ -In short: The SSMD gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. +In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. ##### Curve-repsonse-score (CRS) @@ -219,17 +223,17 @@ zScore= \frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 \end{cases} $$ -The CRS combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}$=2.59. The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the Z’ factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$. Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$. The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. +The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. ### Metrics subtab -The metrics screen enables to visualize different metrics (Z', V', SSMD, logFC, CRS as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. +The metrics screen enables to visualize different metrics (*Z'*, *V'*, *SSMD*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. ## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. -The lower left part shows different metrics (both assay quality metrics like Z', V', CRS and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** +The lower left part shows different metrics (both assay quality metrics like *Z'*, *V'*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** The lower right shows processing (and in case of Bruker data also some measurement meta data) as a summary. @@ -253,7 +257,7 @@ The loading's can used to identify peaks that have a high influence to the score knitr::include_graphics("figures/pca_loadings.png") ``` -Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in Z', V' or CRS) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. +Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in *Z'*, *V'* or *CRS*) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. ## Cluster tab diff --git a/manual.md b/manual.md index 7273b5e..7a09646 100644 --- a/manual.md +++ b/manual.md @@ -5,7 +5,6 @@ date: "09.08.2024" output: html_document: mathjax: "//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -runtime: shiny --- @@ -28,7 +27,7 @@ The following features were already part of `MALDIcellassay`: - graphical user interface - interactive data exploration - support for [mzML](#mzml) data \* -- calculation of quality metrics (Z', V', log2FC, CRS) \* +- calculation of quality metrics (*Z'*, *V'*, *log2FC*, *CRS*) \* - feature ranking by metric \* - principle component analysis (PCA) - curve clustering @@ -121,7 +120,7 @@ The analysis pipeline consist of the following steps (see figure below for a gra 9. `Intensity matrix`: The peaks of the average spectra are transformed into a matrix with columns representing *m/z* values and rows representing concentrations whereas cells contain the respective intensity. 10. `Varience filtering` is applied. 11. `Curve fitting` is performed. -12. `Quality metrics` are calculated (V', Z', SSMD, Log2FC, CRS). +12. `Quality metrics` are calculated (*V'*, *Z'*, *SSMD*, *Log2FC*, *CRS*). 13. The peaks can be selected in the `Peak table`. 14. The respective dose-response curve as well as the peak profile is visualized and might be saved. @@ -150,13 +149,18 @@ Below the two plots the peak table is shown. Here all found signals as well as a ##### Modified Z': -In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., Iversen et al., Ravkin et al.] . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the Z’ factor (Zhang, Chung and Oldenburg 1999), defined by $$ -Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{| \mu_u-\mu_l |} -$$ is implemented into **M²ara**. The mod. Z' score helps to make a judgment about the distance of the means ($\mu$, more is better) and standard deviation ($\sigma$, less is better) of the upper ($_u$) and lower ($l$) end of the curve. +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z’* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by + +$$ +Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} +$$ + +is implemented into **M²ara**. The modified *Z'* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. ##### Modified V': -The modified *V*’ [Ravkin et al.] is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by +The modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by + $$ V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} $$ @@ -164,11 +168,11 @@ $$ with $$ -\sigma_f=\sqrt{\frac{1}{N}\sum(f_exp-f)^2} +\sigma_f=\sqrt{\frac{1}{N}\sum(f_{exp}-f)^2} $$ -where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V*’ factor reflects the goodness of the fit and thus the variance within all data points described by the model. -In short: V' focuses more on the goodness of fit of the curve to the data points. +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V'* factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: *V'* focuses more on the goodness of fit of the curve to the data points. ##### Log2-Fold-Change @@ -182,12 +186,12 @@ In short: The $Log_2FC$ gives the raw (no variation of data points considered) d ##### SSMD -The Strictly Standardized Mean Difference (SSMD), is implemented (Bray and Carpenter 2004; Zhang 2007), with: +The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: $$ SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} $$ -In short: The SSMD gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. +In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. ##### Curve-repsonse-score (CRS) @@ -219,17 +223,17 @@ zScore= \frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 \end{cases} $$ -The CRS combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}$=2.59. The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the Z’ factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$. Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$. The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. +The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. ### Metrics subtab -The metrics screen enables to visualize different metrics (Z', V', SSMD, logFC, CRS as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. +The metrics screen enables to visualize different metrics (*Z'*, *V'*, *SSMD*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. ## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. -The lower left part shows different metrics (both assay quality metrics like Z', V', CRS and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** +The lower left part shows different metrics (both assay quality metrics like *Z'*, *V'*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** The lower right shows processing (and in case of Bruker data also some measurement meta data) as a summary. @@ -256,7 +260,7 @@ The loading's can used to identify peaks that have a high influence to the score

PCA loadings plot

-Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in Z', V' or CRS) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. +Using the `Summarise loadings`-button either the summarized (see figure above) or full (in a loadings vs **m/z** spectrum) loading's can be visualized. Using the `Send to peak table`-button the numeric loading's can be send to the peak table on the [Curve](#curve-screen)-screen to investigate easily if the overlap with univariate signals of interest (high scores in *Z'*, *V'* or *CRS*) or if the represent a separate regulation cause by many smaller changes not strong enough to lead to high scores on their own. ## Cluster tab From 9327f75e84a3b1cc79f8e63a34d74feba2702409 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein Date: Sat, 17 Aug 2024 10:40:49 +0200 Subject: [PATCH 11/13] more fixes --- manual.Rmd | 6 ++++++ manual.md | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/manual.Rmd b/manual.Rmd index 5dbaba5..5f0c3be 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -181,6 +181,7 @@ The $Log_2FC$ denotes the magnitude i.e. effect size of a response. It is define $$ Log_2 FC =log_2\frac{a_u}{a_l} $$ + where $a_u$ and $a_l$ the upper and lower asymptotes. In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. @@ -211,11 +212,15 @@ fcScore= \frac{|log_2FC|}{log_2FC} \end{cases} $$ + and + $$ vScore=V'_{mod.} $$ + and + $$ zScore= \begin{cases} @@ -223,6 +228,7 @@ zScore= \frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 \end{cases} $$ + The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. ### Metrics subtab diff --git a/manual.md b/manual.md index 7a09646..b9d01a4 100644 --- a/manual.md +++ b/manual.md @@ -181,6 +181,7 @@ The $Log_2FC$ denotes the magnitude i.e. effect size of a response. It is define $$ Log_2 FC =log_2\frac{a_u}{a_l} $$ + where $a_u$ and $a_l$ the upper and lower asymptotes. In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. @@ -211,11 +212,15 @@ fcScore= \frac{|log_2FC|}{log_2FC} \end{cases} $$ + and + $$ vScore=V'_{mod.} $$ + and + $$ zScore= \begin{cases} @@ -223,6 +228,7 @@ zScore= \frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 \end{cases} $$ + The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. ### Metrics subtab From a754d86dd3c07d9c659e0db00bb40d68d71772a5 Mon Sep 17 00:00:00 2001 From: Thomas Enzlein Date: Sat, 17 Aug 2024 10:44:07 +0200 Subject: [PATCH 12/13] more fixes --- manual.Rmd | 2 ++ manual.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/manual.Rmd b/manual.Rmd index 5f0c3be..3c8486a 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -192,6 +192,7 @@ The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Car $$ SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} $$ + In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. ##### Curve-repsonse-score (CRS) @@ -203,6 +204,7 @@ CRS= 0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 \end{cases} $$ + with $$ diff --git a/manual.md b/manual.md index b9d01a4..0f3cf95 100644 --- a/manual.md +++ b/manual.md @@ -192,6 +192,7 @@ The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Car $$ SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} $$ + In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. ##### Curve-repsonse-score (CRS) @@ -203,6 +204,7 @@ CRS= 0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 \end{cases} $$ + with $$ From becc075429917899164007eab6c57bf8f28bf52d Mon Sep 17 00:00:00 2001 From: Thomas Enzlein Date: Sat, 17 Aug 2024 10:48:09 +0200 Subject: [PATCH 13/13] last try to fix formulas for today... --- manual.Rmd | 22 +++++++--------------- manual.md | 24 ++++++++---------------- 2 files changed, 15 insertions(+), 31 deletions(-) diff --git a/manual.Rmd b/manual.Rmd index 3c8486a..dcbf741 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -197,39 +197,31 @@ In short: The *SSMD* gives the difference between the upper and lower part of th ##### Curve-repsonse-score (CRS) -$$ -CRS= +$$CRS= \begin{cases} \frac{fcScore+vScore+zScore}{3}*100,\\ 0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 -\end{cases} -$$ +\end{cases}$$ with -$$ -fcScore= +$$fcScore= \begin{cases} 1 \quad for \quad |log_2FC| > log_2FC_{max}\\ \frac{|log_2FC|}{log_2FC} -\end{cases} -$$ +\end{cases}$$ and -$$ -vScore=V'_{mod.} -$$ +$$vScore=V'_{mod.}$$ and -$$ -zScore= +$$zScore= \begin{cases} 1 \quad for \quad Z'_{mod.}>0.5\\ \frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 -\end{cases} -$$ +\end{cases}$$ The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. diff --git a/manual.md b/manual.md index 0f3cf95..f3c3287 100644 --- a/manual.md +++ b/manual.md @@ -37,7 +37,7 @@ The following features were already part of `MALDIcellassay`: # General information -The blue question mark icons () throughout the application can be clicked and provide further information on the specific settings. +The blue question mark icons (preservef45ef1239f806958) throughout the application can be clicked and provide further information on the specific settings. # Requirements to the raw data {#requirements-to-the-raw-data} @@ -197,39 +197,31 @@ In short: The *SSMD* gives the difference between the upper and lower part of th ##### Curve-repsonse-score (CRS) -$$ -CRS= +$$CRS= \begin{cases} \frac{fcScore+vScore+zScore}{3}*100,\\ 0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 -\end{cases} -$$ +\end{cases}$$ with -$$ -fcScore= +$$fcScore= \begin{cases} 1 \quad for \quad |log_2FC| > log_2FC_{max}\\ \frac{|log_2FC|}{log_2FC} -\end{cases} -$$ +\end{cases}$$ and -$$ -vScore=V'_{mod.} -$$ +$$vScore=V'_{mod.}$$ and -$$ -zScore= +$$zScore= \begin{cases} 1 \quad for \quad Z'_{mod.}>0.5\\ \frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 -\end{cases} -$$ +\end{cases}$$ The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data.