From e9cefc46e6505d94ea559fe041b4ae523f5584ac Mon Sep 17 00:00:00 2001 From: Thomas Enzlein <70519530+thomas-enzlein@users.noreply.github.com> Date: Tue, 29 Oct 2024 07:44:10 +0100 Subject: [PATCH] rename metrics (#4) * clean up lasso experiment remnants * sync package loading with req.txt file * clean up redudant code to check and install packages * update dockerfile to fix config errors when building * set up tests for mzML and Bruker file loading * clean up main directory * activate GHA caching * fix `zoo` dependency problem * remove browser() for debugging pca plots. * update actions/cache to v4.0.2 * update GHA to use versions instead of hashes * add tests to GHA * add test that checks if processing was sucessful * set smoothHalfWindowSize to a more reasonable value * remove devtools and use pak instead * update metrics naming (mod Z' -> FZ, mod V' -> FV, SSMD -> FS --- .gitignore | 1 + README.md | 78 +++++++++++++++++++------------------- components/mainTab.R | 6 +-- components/server.R | 2 +- figures/logo_resized.png | Bin 0 -> 7180 bytes functions/helpers.R | 11 ++++++ functions/plotFunctions.R | 10 ++--- functions/storeResults.R | 8 ++++ manual.Rmd | 44 ++++++++++----------- manual.md | 44 ++++++++++----------- 10 files changed, 111 insertions(+), 93 deletions(-) create mode 100644 figures/logo_resized.png diff --git a/.gitignore b/.gitignore index bdc7acd..646b475 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,4 @@ testdata_bruker.zip testdata_mzML.zip Curve mzMl +bayer_export.R diff --git a/README.md b/README.md index d1956ac..7387c1a 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,34 @@ -M²ara overview - -# M²ara - MALDI MS Bioassays Evaluation and Classification App +# M2ara M²ara is a software tool to facilitate the exploration of metabolomic responses in complex matrix-assisted laser desorption/ionization mass spectrometry (MALDI MS) bioassays. The app is intended for the evaluation of metabolomic drug actions by using the mass-to-charge ratios of hundreds of metabolites and it is particularly useful in defining novel pharmacodynamic biomarkers for high-throughput applications. M²ara is based on the R package [MALDIcellassay](https://github.com/CeMOS-Mannheim/MALDIcellassay) (published in [Unger et. al. 2021](https://www.nature.com/articles/s41596-021-00624-z), Nature Protocols) and extends its capabilities with a GUI and adds helpful features like clustering of curves, PCA analysis as well as the Curve Response Score (CRS) which enables fast screening for molecules regulated by drug treatment. -For more information please check out the [preprint](https://chemrxiv.org/engage/chemrxiv/article-details/663a1d0f418a5379b0aa286b). +For more information please check out the [preprint](https://chemrxiv.org/engage/chemrxiv/article-details/663a1d0f418a5379b0aa286b). -M²ara workflow overview +M²ara workflow overview ## How to use -This application simplifies the analysis of Molecular High Content Screening (MHCS) MALDI-TOF MS assay data and the evaluation of complex drug actions. After your data has been loaded, you can adjust settings as needed and start the processing. From here, you can analyze your data by selecting entries in the data table, visually inspect and rank mass features using the Curve Response Score (CRS) fingerprints, and save the curve fit and peak profile of your chosen *m/z* value. +This application simplifies the analysis of Molecular High Content Screening (MHCS) MALDI-TOF MS assay data and the evaluation of complex drug actions. After your data has been loaded, you can adjust settings as needed and start the processing. From here, you can analyze your data by selecting entries in the data table, visually inspect and rank mass features using the Curve Response Score (CRS) fingerprints, and save the curve fit and peak profile of your chosen *m/z* value. This app is specifically designed for use with Bruker flex series raw data but also features support for mzML. For more detailed information please take a look at the [Manual](manual.md) that is also available inside the app. -M²ara GUI overview +M²ara GUI overview -## How to install +## How to install ### R Clone the GitHub repository to your local machine (please make sure to have R installed, tested with **R v4.3.2**) and start the app by sourcing the `app.R` file. -```bash +``` bash git clone https://github.com/CeMOS-Mannheim/M2ara.git ``` -```R +``` r # install all packages needed source("install_packages.R") @@ -39,23 +37,24 @@ source("app.R") ``` ### Docker -Install the [docker container](https://hub.docker.com/repository/docker/thomasenzlein/m2ara), run it and access `localhost:3838` to interact with the app. -Don't forget to change the path `c:/path/to/massSpecData` to your data so that it can be mounted when running the container. -```bash +Install the [docker container](https://hub.docker.com/repository/docker/thomasenzlein/m2ara), run it and access `localhost:3838` to interact with the app. Don't forget to change the path `c:/path/to/massSpecData` to your data so that it can be mounted when running the container. + +``` bash docker pull thomasenzlein/m2ara:main ``` -```bash +``` bash docker run -p 3838:3838 -v c:/path/to/massSpecData:/mnt thomasenzlein/m2ara:main ``` ### Stand-alone installer for Windows -Use the stand-alone installer (Windows only, no R installation needed). -The installer can be downloaded [here](https://github.com/CeMOS-Mannheim/M2ara/releases/download/1.4.1/M2ara_1.4.1.exe). + +Use the stand-alone installer (Windows only, no R installation needed). The installer can be downloaded [here](https://github.com/CeMOS-Mannheim/M2ara/releases/download/1.4.1/M2ara_1.4.1.exe). ## Example data -To test the app please use the example data on [FigShare](https://dx.doi.org/10.6084/m9.figshare.25736541). + +To test the app please use the example data on [FigShare](https://dx.doi.org/10.6084/m9.figshare.25736541). #### Unger2020_OATP2B1_inhibition_mzML.zip @@ -63,17 +62,17 @@ The file contains mzML data (converted from Bruker Flex using MSConvert) origina To replicate the results shown use the following parameters: -- under Settings set File Format to mzML -- set Concentration unit to nM -- set Normalization/re-calibration *m/z* to 354.1418 (D4-E3S, [M-H]-) -- set recalibration tolerance to 0.1 Da -- set normalization to *m/z* -- deactivate smoothing and activate baseline removal -- set Aggregation method to mean -- set SNR to 3 -- set alignment to 0 mDa (no alignment) -- set binning tolerance to 100 ppm -- select the folder `mzML` (parent folder of the mzML files) from the .zip file, please make sure that no other files are in this folder. +- under Settings set File Format to mzML +- set Concentration unit to nM +- set Normalization/re-calibration *m/z* to 354.1418 (D4-E3S, [M-H]-) +- set recalibration tolerance to 0.1 Da +- set normalization to *m/z* +- deactivate smoothing and activate baseline removal +- set Aggregation method to mean +- set SNR to 3 +- set alignment to 0 mDa (no alignment) +- set binning tolerance to 100 ppm +- select the folder `mzML` (parent folder of the mzML files) from the .zip file, please make sure that no other files are in this folder. Alternatively, copy the [this file](https://github.com/CeMOS-Mannheim/M2ara/blob/main/tests/testthat/settings_mzML_data.csv) as `settings.csv` into the main folder of the app. @@ -85,19 +84,18 @@ The file contains data in the Bruker Flex format originally published in Weigt, To replicate the results shown use the following parameters: -- under Settings set File Format to Bruker Flex -- set Concentration unit to µM -- set Normalization/re-calibration *m/z* to 760.5851 (PC(34:1) [M+H]+) -- set recalibration tolerance to 0.1 Da -- set normalization to TIC -- activate smoothing and baseline removal -- set Aggregation method to mean -- set SNR to 3 -- set alignment to 0 mDa (no alignment) -- set binning tolerance to 100 ppm -- select the the folder `curve` from the .zip file, make sure no other files/folders are present. +- under Settings set File Format to Bruker Flex +- set Concentration unit to µM +- set Normalization/re-calibration *m/z* to 760.5851 (PC(34:1) [M+H]+) +- set recalibration tolerance to 0.1 Da +- set normalization to TIC +- activate smoothing and baseline removal +- set Aggregation method to mean +- set SNR to 3 +- set alignment to 0 mDa (no alignment) +- set binning tolerance to 100 ppm +- select the the folder `curve` from the .zip file, make sure no other files/folders are present. Alternatively, copy the [this file](https://github.com/CeMOS-Mannheim/M2ara/blob/main/tests/testthat/settings_bruker_data.csv) as `settings.csv` into the main folder of the app. The target is *m/z* 826.5722 (PC(36:1) [M+K]+) and *m/z* 616.1767 (Heme B [M+H]+) the pIC50 values should be 9.5 and 9.7. - diff --git a/components/mainTab.R b/components/mainTab.R index 6233de4..75add9b 100644 --- a/components/mainTab.R +++ b/components/mainTab.R @@ -53,9 +53,9 @@ mainTab <- function() { selectInput(inputId = "metric", label = "Metric", choices = c("CRS", - "Z'", - "V'", - "SSMD", + "FZ", + "FV", + "FS", "log2FC", "pEC50"), selected = defaults$errorbars, diff --git a/components/server.R b/components/server.R index dabb0dd..8d4a805 100644 --- a/components/server.R +++ b/components/server.R @@ -51,6 +51,7 @@ server <- function(input, output, session) { normMz = input$normMz, normTol = input$normTol, normMeth = input$normMeth, + smoothHalfWindowSize = 3, alignTol = input$alignTol * 1e-3, halfWindowSize = input$halfWindowSize, peakMethod = input$peakMethod) @@ -118,7 +119,6 @@ server <- function(input, output, session) { dir = appData$selected_dir ) ) - message(MALDIcellassay:::timeNow(), " processing done\n") # write everything needed into appData diff --git a/figures/logo_resized.png b/figures/logo_resized.png new file mode 100644 index 0000000000000000000000000000000000000000..890dc0a80c150087fc89e18e2e287083df71c0a2 GIT binary patch literal 7180 zcmXY0bzD?muwPb??oMfz?plzNl?JILmPUGkrBh((5RmSWlI{ip=~zIz6_!*&KqQUF z@4ffOx#x4|%$fO4eP-_G#=X*3A;hD@0{{SoYO2b5kNf$53kUP@nQq2H`?xXNs_JP0 z0D+tUKv*OIaQ{dOI|2ZFgaCl=RsaAv8vvk2m{e|Kq5!+i$3xa)w3&atoURP1~2LR|b)sz(t{MSzl0y-HCb7H=p^}Siy z40^C18h#h0K*gLxg42vef`y^NV3f}cE+rs}7d?tSKN@F~i^UgVO~Mxi&a%C=2_vPk zBQ;na=7>}n%v4}yX9;rq+ICZaSE}v0rKX&PHKN?d776=1~Z=icaFDj>Q`mB?e=A?oX^O20~5Xvb+dE$ zLqv0^e8v`2!K8HwYDprVvqlK2_Hlnl`jNU{J%h4XQ#mxpP@f+e1>pu$$y=+HHYo0>5jzN>L@zDc%(lI{ z;sO%oLZbvv^VOq$MB?m#MW3`#PFG#L&LtUXM~6-~@#`PsEJjZm#G{KqLX)78mzLiw z331oI)t*YIBHn2jpI%4h4TyFniaLYg(MBe-qHt@;7rW0$j11ltCpe*OxyPEb4YVHM zwv?fjc0&?>TPg(ZU4F;5~9K_}9Q! z^reoPc57}fV8a7NeSn+_@YD|ny$>d~=YHs4MFaow`en(K$5weJ@Vx4GTi z&OCAhVFq;shh3+a+{q8MW7%Jhc=e=I1Bq0&%-+y%j6R>AjtOutZBWr2QEFGE+(4a7 zPE$9Uci#zqgI_Iy%x>cinq+^7N{DFQ(R`D#&nyyIKXTFSY|LF@|H>{s;_rkLoZm|R ziy>L^UY(*Iz9TvRBa?nNv3uZ@FT?F#2%GK=2cFU7rjEtsRxr_=s#Lo{<} zj_<&cO4M>h;Vv}`kD%p>YfDi!5jtVta2LrpdQ|(9W(C%jMw@@GxPtIA^7C&gC3eIXX zmozoE3gDpsW3bAUdgM^CZWXFNpB)`(ZYAxb`b6YqAS9y{vmUEUu4K-&Z^Db)L)*g; zvJzoWu_`5M_sOUD4)ctBR+t0v+))mxe^Apjd;nq>%NxMW8<-3~R%pT9RBE_0H9;^s zN+4n1ge`gT>z_<>BIvgcT>iVghI!Tqk>mb9V%GFXG3I~A=j^bN+Ug_lXE@Mr<0)t( z_w0m|c^VoSEr?sJ6;=K8!y2ac+UCmtDxkEUyf(!(!_KY4d4M;-43%vWxmw%lr#>zV z1zVV9UGYLt>rZ`ZNQw}#tN~x6i0YDKB`@53Z=8GsAohnYOt-c!I%cOHF6i5|1G3I7 zQl&HlR|KL2G)S{>#G#EJ6wF!xW;D4nO~=CK!oyGU5ZAe+N%lg{W||XWM>KDPNg$-7 zIV{arx8*$rdnCqK4U0$+-zpi;ev4i5i+L)JM_DARxr2q5T^HYc7D2d6PqDe^kc1j1 zznsYQl78wMdNamW#7posZH#g1Qyx}uo?>#YOLT6iG|P?2P=-LUOWL*tiUOb+Kwf+z zO$s3F2~|O%6D?VFe7*b}-lo9xL8<%7v3~;^|5!l`NdM+gB1ZImY8eh}4#m;FIJ`EX z%+e^nm)#0sGPE5pl1$^`c@)!1WoCjaBRmLguFz1zWyp5qmpb5XoWq*R2G;O^g>-S? zMS*FTt1}4kx1O(pOE8Hv#i-!8n}W${`OI;Vz=vss7d?Y5zcUY9BGida>2K`)|^ai)*@CmK&>BiOky{!5ib z!`98B*R3{&To~8B^TA zvVz?Ng>h6EHfJ!B_d~$1aSDfP(|^G1kJ$FH~>yfN!|A z#PD3*YLL2f9&hW98oz`kS^Dt$6;GegV@ys?KwS@y4t`OkWM%gWBkoH$%>0j9bXL^> z;1ws=j5aRAyK^iFHy^PDu+o2ch1V_UG}{B~Ru+$bx>C&Sy_insWi7 zeXm9&Fj<3#^Y&2D$w{($l9=u`CW9P%qxalE}pjo^z1$%W1-uM z3q^ZOqWKc?+}S%0&fkNW2dP$bZ*xa-8+NX^q=%s12Eo<^IF?pJKrb56dY(UO(A9X3 zMqNnFc}rL%KVnlClH7Tp_O|Mr!;5(s&q5c7&3+>u!)1kv3t4+imc};Eazc;E@#9-pI+iV>I{jD95d9sG{fn!sei{YdI7@TGAgQtkl52 z>e&uHeE8-VNFPC)DLN7nXN5Q(PP>NGo)m&C~; z*(x54*7xKaV@P`*MsbEoBB(V^c@@}UXxzA?6Q5J(;IkT^)G>e!F%{=%B;J8FlENx= zV)+g4(l&$Dwy7)Y;yF>QcB^eqn&4Dxj06g z#kTc@=?xbxo9RBDQAB9)4hfq=IQJaYsxliKdAjoz%w`X672q*~HSIo{%sM)pK2R7b zpIUBO!eGnSWbNvhUhi^;~P*D}! zY7xnc*d+a1xBFJd6CXBr0V{3>Z4&AmhE-8p5W)m^RZq4o}w64t+Xt}=^}rSuCext!(_+7 z0+@xg5MSJZbOL2cBM9fw(pXpVqLtp_#t*}?m@|O1#a=$a60+4ZJk5kd3ODZ01i!O7 zHl|2E&zv~MJKH?w?)ABQL@Pq}>RoH1vXT9lbL!RXb}wavC~;D!$bn4xDswQIt9yst z$2~<0Y^$c+B^E9-B#fsd!iO(Z@55r2LBfEv#FNHXiS`q5>f7O``lVx=kdANDr+#Z2 zp}$%V0og-yr0I*89B|~XyyB78BbgQ3XD#H%Nay)Z(fNgT*ICRB5tm7e0O^<~>Q0FW9hfZGF|OB{8q^ zl?nmEy0qcV&-_L9%8C;umZjnJIi3@rrn;3R?3+gNnFeAn4u%!Q!Ft?e* zGzK+lVZ5@n@n5ozyb~{^<1+Z{B)HdAi~R&*)+Pea4sQmf=Slj)I0J%b9A@3lE*&wl zJOG4=t|*U_mZyY?d|54wgo);S-fIs((F|@km0?DW)P#v5$<6Umnx!x?8IPm`Q;F0J z3s~IYox?=gN5}~ojM85a(9%_4HbQzv9oh$ldXmGq)kAs5$n&YgrR!cRUncl{YtbEQ z+b(mVhu@{y$9V#mK{F z)j65Qm z&sI;I1mo8S)L}H#7npuQ79&~y$;XlqJchBjB$th8dy8EF1e`OWi|Qa8^7iC#36NM6 zX<$Cl&deKFZX7l9xX7^P1$X6QM$2$}5#kT`h;V?X3vWQ7XbYxY!)S=Rly-c_MMJ(5 zQ(HnpKqagHQVA-?KgPfALiXlDL@jCWb><}xZ6InqA%tmHZn?4G?NN7|P@>S_)&DQMwZ7|N9b5sO}(v{H{eEs|V{;f9cMmZ6( z17Np3wgZQ;mdLi$QZDJ)-uS$`3L2)w6tt>uX~T;Y_UY%pBR}x=u0wo1=#M*>GG%|k z`-bXYX>2Ll@~;N|SROrau_)HF#kV|sS}})xB|7d*<%Qp$y1p-0Z`0-UiyNwnHF~0A zHY;xL@T#=v1HeD`U+Hjiz5A_xZLJI@yf?pL+Q2KL z8zRVC<85GegQlOscPs=Xzupo1{!Mo=f4Fx?uy7RhU8L3S!!NqiAio8yx{{EH7@iHD z!G1!Xn!VQJFN&ERQzG9t=Ih!DM)-;%HFYiw)m)2(zXu(QPCj$}!TQTGhQCp}d?12y zDDsSo;rc$RVS`;GU}k!A%n&1s)PLmKFTuMyL=0#IotvRGwu{4h7@>Y@H|~40b#TEo ze~Xd)^f!M{$E}^d8pPh9(C4;$yOQzk4WhtkFT$zl{NykiIX}!n>j+FsrcJE@?pJ0VtL|OD^ z{J2EM`|7Z1WTvjMX@54#%DkWn8OJp|EaXs`I2mt~;JNUy#q=d|B~Z0jxf`RNJlY4# zy$+I}e7bzjut6(?Z9a+G`QU>Slrx*maNPLor^E84Y!8iydy;zoBr&~ckc=jBmkId6 zwPEXx+FzD!Pq~A4euQ5Sbf4C^CE&>|oH?^hP`h4IbteC4{!*?Z+@R}H5zUM60b)nA z?zTOQz%HU}dkS(UD-m`O659(hWx4`A00>zr5r)OR_}kXw%psy66r>hRJl zjoX|DN}u&FcCzi7n0@zUrzrK*o`BsBItnxLD6A~=YDbN2B|>2GNVxdeSEp|89w;;* z7)F0x+$~X|-jRn=p{GRCRSfXg&wTBKI-XIKgQK_2{v z97-?CmFoO!aIbD(nIon^7f!w;mM<}Ejy4%-G%0K< zh(@w`Q@7-~*}ztBN6M!FyMrFK&O~hgY}eO0!%8!I zrxh?}EuDe>+g;filY^dx&X}Npa7mC*4oChFE_&9)zEQYz&KmE8DKPpLb!18yfKqF1 zurz)+)z10h#WOX1^QIQfwg*b}u8r)>4C(|IFNB7Y=G{~+Gi6JXKEk=Ov#j5>eW z{^=vQLl>Jse*QNk*q~n4c`Vma5>y~`))DePzs_uQigIO6QS!8+m*Z?lX?f-SWpFyL zt2uudrn&dV9^?K)aBcxIbb|2Cthu=f`H{Ki+w0%#=0p9Db4|d2?RaP}aEQDXZ8YCM zBT>Ii-s{ZEP>0sB2zvA0L+0N=5y!&iWbI*^nPRBwDI&%@9C@Dmjf_Bpn1nr4tJ%iG zk*j7+Jx2;AYtLs7D*2}|W+8iQGP0IGi$^59aV|0hKae{k`I-K2FpZC0F)4<3Uif5R z5jvkViqr3_wq*5TxWQ!XAwXUu8}56ge$|p9b7y%CXhb?>Yv(I65z4A8*u-D`4OIP? z_8s`F4CItZ%<*m4z-31WPk;)F#_>R5{!}wmOW*eZ*hs8|ktd3+q?8V5C()?9ca)v7 zKWL2Y(k#y)a6cy#Y)wNaBY(IbiS3dA!v#j5w}o`TL~`VwAc7|k4+E&xEA3tDYR$$NXh>;ja8;=^%uL? zx_>myt3CYP-?N$ArKGmBhlwts%7NHnvL5!r34YS?$~t}|UR7*jx~^l+ZcJj6jz>T1 zU^%lpl>B8OoNdnpU#@eNu%6}KE2*^JMGrh7Lpq4A=x_67Y4GCKnA7V})`T^W(O@X3 z(}@{b_54+$_LrBDTJ#x`OL8a}({YshP>honH9yKW?Q)O*v6}|X@lA0i-fF(eCODs^ zV4~|qS%}Dwh@woUJ@yPxqgoE@1Ou>fBKxJa*zIwd+l^CxgX!mrJw^t$KWOGT-Gf}Hiy>it8l=g;JTE^p68wO{#- zl9C-F`X-nx?U*jIgWW|EJ_Hk&H6_EOr$!PRDkT{irrLV-A~O8{d^@ zIL^-3n90sdmrI;NI_PS`#!FlwgXN6q@*S!k2q!YF zJL#T9ck5Ot7g`gYg9V$AhpVPTN|yDQb LXe-w%S%v=(uh^F< literal 0 HcmV?d00001 diff --git a/functions/helpers.R b/functions/helpers.R index b1b0b44..b5c62d5 100644 --- a/functions/helpers.R +++ b/functions/helpers.R @@ -106,3 +106,14 @@ checkMetaData <- function(object) { return(TRUE) } + +#' Extract directory path +#' +#' @param object Object of class MALDIassay +#' +#' @return +#' List, containing the data used to do the fits as well as the nlpr curve fit . +getDirectory <- function(object) { + MALDIcellassay:::stopIfNotIsMALDIassay(object) + return(object@settings$dir) +} diff --git a/functions/plotFunctions.R b/functions/plotFunctions.R index 5c2742d..add274a 100644 --- a/functions/plotFunctions.R +++ b/functions/plotFunctions.R @@ -267,7 +267,7 @@ plateMapPlot <- function(appData, return(p) } -scorePlot <- function(stats, metric = c("CRS", "V'", "Z'", "log2FC", "pEC50", "SSMD")) { +scorePlot <- function(stats, metric = c("CRS", "FV", "FZ", "log2FC", "pEC50", "FS")) { metric <- match.arg(metric) df <- stats %>% @@ -275,7 +275,7 @@ scorePlot <- function(stats, metric = c("CRS", "V'", "Z'", "log2FC", "pEC50", "S select(c("mz", "direction")) %>% mutate(value = pull(stats, metric)) - if(metric %in% c("V'", "Z'")) { + if(metric %in% c("FV", "FZ")) { # cut V' and Z' at zero as lower values then zero just indicate bad models # and its prettier for visualization df <- df %>% @@ -284,7 +284,7 @@ scorePlot <- function(stats, metric = c("CRS", "V'", "Z'", "log2FC", "pEC50", "S limits <- c(-1, 1) } - if(metric %in% c("CRS", "V'", "Z'", "SSMD")) { + if(metric %in% c("CRS", "FV", "FZ", "FS")) { df <- df %>% mutate(value = if_else(direction == "down", -value, value)) } @@ -300,14 +300,14 @@ scorePlot <- function(stats, metric = c("CRS", "V'", "Z'", "log2FC", "pEC50", "S y = ylab, col = NULL) - if(metric %in% c("V'", "Z'")) { + if(metric %in% c("FV", "FZ")) { p <- p + scale_y_continuous(limits = limits, breaks = c(-1, -0.5, 0, 0.5, 1), labels = c(1, 0.5, 0, 0.5 , 1)) } - if(metric %in% c("log2FC", "SSMD")) { + if(metric %in% c("log2FC", "FS")) { absVal <- abs(df$value) absVal <- absVal[!is.infinite(absVal)] absMax <- max(absVal, na.rm = TRUE) diff --git a/functions/storeResults.R b/functions/storeResults.R index 050a1c9..a72b039 100644 --- a/functions/storeResults.R +++ b/functions/storeResults.R @@ -1,6 +1,14 @@ storeResults <- function(appData, res, input, stats) { appData$res <- res appData$preprocessing <- appData$preprocessing + + # rename Z', V', SSMD to FZ, FV and FS + stats <- stats %>% + rename("FZ" = `Z'`, + "FV" = `V'`, + "FS" = SSMD) + + appData$stats_original <- stats # copy of original stats for updates appData$stats <- stats diff --git a/manual.Rmd b/manual.Rmd index dcbf741..0bb1b77 100644 --- a/manual.Rmd +++ b/manual.Rmd @@ -29,7 +29,7 @@ The following features were already part of `MALDIcellassay`: - graphical user interface - interactive data exploration - support for [mzML](#mzml) data \* -- calculation of quality metrics (*Z'*, *V'*, *log2FC*, *CRS*) \* +- calculation of quality metrics (*FZ*, *FV*, *log2FC*, *CRS*) \* - feature ranking by metric \* - principle component analysis (PCA) - curve clustering @@ -121,7 +121,7 @@ The analysis pipeline consist of the following steps (see figure below for a gra 9. `Intensity matrix`: The peaks of the average spectra are transformed into a matrix with columns representing *m/z* values and rows representing concentrations whereas cells contain the respective intensity. 10. `Varience filtering` is applied. 11. `Curve fitting` is performed. -12. `Quality metrics` are calculated (*V'*, *Z'*, *SSMD*, *Log2FC*, *CRS*). +12. `Quality metrics` are calculated (*FV*, *FZ*, *SSMD*, *Log2FC*, *CRS*). 13. The peaks can be selected in the `Peak table`. 14. The respective dose-response curve as well as the peak profile is visualized and might be saved. @@ -147,22 +147,22 @@ Below the two plots the peak table is shown. Here all found signals as well as a **M²ara** comes with a variety of helpful scores/metrics that are meant to help judging the quality of response curves. -##### Modified Z': +##### FZ: -In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z’* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z'* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by $$ -Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} +F_{Z} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} $$ -is implemented into **M²ara**. The modified *Z'* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. +is implemented into **M²ara**. The modified *FZ* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. -##### Modified V': +##### FZ: -The modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by +A modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by $$ -V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} +F_{V}=1-6*\frac{\sigma_f}{|a_u-a_l|} $$ with @@ -171,8 +171,8 @@ $$ \sigma_f=\sqrt{\frac{1}{N}\sum(f_{exp}-f)^2} $$ -where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V'* factor reflects the goodness of the fit and thus the variance within all data points described by the model. -In short: *V'* focuses more on the goodness of fit of the curve to the data points. +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the *FV* factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: *FZ* focuses more on the goodness of fit of the curve to the data points. ##### Log2-Fold-Change @@ -185,22 +185,22 @@ $$ where $a_u$ and $a_l$ the upper and lower asymptotes. In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. -##### SSMD +##### FS -The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: +The *FS* is baed on the Strictly Standardized Mean Difference (*SSMD*) and is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: $$ -SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} +F_S = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} $$ -In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. +In short: The *FS* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. ##### Curve-repsonse-score (CRS) $$CRS= \begin{cases} \frac{fcScore+vScore+zScore}{3}*100,\\ -0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 +0 \quad for \quad F_{Z}<-0.5 \quad or \quad F_{V}<-0.5 \end{cases}$$ with @@ -213,27 +213,27 @@ $$fcScore= and -$$vScore=V'_{mod.}$$ +$$vScore=F_{V}$$ and $$zScore= \begin{cases} -1 \quad for \quad Z'_{mod.}>0.5\\ -\frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 +1 \quad for \quad F_{Z}>0.5\\ +\frac{F_{Z}}{0.5} \quad for \quad 0.5 > F_{Z}>-0.5 \end{cases}$$ -The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. +The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $F_{V}$ factor being equal to the vScore and the $F_{Z}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $F_{Z}$ factor to the zScore is made due to the common interpretation of the *FZ* factor (Zhang, Chung and Oldenburg 1999). For $F_{Z}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. ### Metrics subtab -The metrics screen enables to visualize different metrics (*Z'*, *V'*, *SSMD*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. +The metrics screen enables to visualize different metrics (*FZ*, *FV*, *FS*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. ## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. -The lower left part shows different metrics (both assay quality metrics like *Z'*, *V'*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** +The lower left part shows different metrics (both assay quality metrics like *FZ*, *FV*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** The lower right shows processing (and in case of Bruker data also some measurement meta data) as a summary. diff --git a/manual.md b/manual.md index e3ebe68..3ef3019 100644 --- a/manual.md +++ b/manual.md @@ -27,7 +27,7 @@ The following features were already part of `MALDIcellassay`: - graphical user interface - interactive data exploration - support for [mzML](#mzml) data \* -- calculation of quality metrics (*Z'*, *V'*, *log2FC*, *CRS*) \* +- calculation of quality metrics (*FZ*, *FV*, *log2FC*, *CRS*) \* - feature ranking by metric \* - principle component analysis (PCA) - curve clustering @@ -120,7 +120,7 @@ The analysis pipeline consist of the following steps (see figure below for a gra 9. `Intensity matrix`: The peaks of the average spectra are transformed into a matrix with columns representing *m/z* values and rows representing concentrations whereas cells contain the respective intensity. 10. `Varience filtering` is applied. 11. `Curve fitting` is performed. -12. `Quality metrics` are calculated (*V'*, *Z'*, *SSMD*, *Log2FC*, *CRS*). +12. `Quality metrics` are calculated (*FV*, *FZ*, *SSMD*, *Log2FC*, *CRS*). 13. The peaks can be selected in the `Peak table`. 14. The respective dose-response curve as well as the peak profile is visualized and might be saved. @@ -147,22 +147,22 @@ Below the two plots the peak table is shown. Here all found signals as well as a **M²ara** comes with a variety of helpful scores/metrics that are meant to help judging the quality of response curves. -##### Modified Z': +##### FZ: -In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z’* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by +In pharmaceutical industry and research, the quality of a bioassay is assessed by common metrics that rely on a negative and positive control [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), [Iversen et al., 2006](https://doi.org/10.1177/1087057105285610), [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) . However, in order to be able to explore unknown cellular drug effects in whole-cell MALDI MS bioassays and to classify m/z features as either up-, down- or non-regulated, characteristic measures need to be deduced from the concentration response data directly. First, to assess the variability within the assay data relative to the effective window size, a modified form of the *Z'* factor [Zhang et al., 1999](https://pubmed.ncbi.nlm.nih.gov/10838414/), defined by $$ -Z'_{mod.} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} +F_{Z} = 1-\frac{3*(\sigma_u+\sigma_l)}{|\mu_u-\mu_l|} $$ -is implemented into **M²ara**. The modified *Z'* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. +is implemented into **M²ara**. The modified *FZ* score helps to make a judgment about the distance of the means ( $\mu$ , more is better) and standard deviation ( $\sigma$ , less is better) of the upper ( $_u$ ) and lower ( $_l$ ) end of the curve. -##### Modified V': +##### FZ: -The modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by +A modified *V'* [Ravkin et al., 2004](http://www.ravkin.net/articles/5322-7.pdf) is introduced to assess the root-mean-square deviation of the response data relative to the log-logistic model fit, determined by $$ -V'_{mod.}=1-6*\frac{\sigma_f}{|a_u-a_l|} +F_{V}=1-6*\frac{\sigma_f}{|a_u-a_l|} $$ with @@ -171,8 +171,8 @@ $$ \sigma_f=\sqrt{\frac{1}{N}\sum(f_{exp}-f)^2} $$ -where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the modified *V'* factor reflects the goodness of the fit and thus the variance within all data points described by the model. -In short: *V'* focuses more on the goodness of fit of the curve to the data points. +where $\sigma_f$ is the standard deviation of the residuals of the 4-parameter non-linear regression model *f* calculated from the experimental (exp) data and the model. Hereby, the *FV* factor reflects the goodness of the fit and thus the variance within all data points described by the model. +In short: *FZ* focuses more on the goodness of fit of the curve to the data points. ##### Log2-Fold-Change @@ -185,22 +185,22 @@ $$ where $a_u$ and $a_l$ the upper and lower asymptotes. In short: The $Log_2FC$ gives the raw (no variation of data points considered) difference between the upper and lower part of the curve. -##### SSMD +##### FS -The Strictly Standardized Mean Difference (*SSMD*), is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: +The *FS* is baed on the Strictly Standardized Mean Difference (*SSMD*) and is implemented [Bray and Carpenter 2004](https://pubmed.ncbi.nlm.nih.gov/23469374/); [Zhang et al., 2007](https://doi.org/10.1016/j.ygeno.2006.12.014), with: $$ -SSMD = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} +F_S = \frac{|\mu_l-\mu_u|}{\sqrt{\sigma^2_u+\sigma^2_l}} $$ -In short: The *SSMD* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. +In short: The *FS* gives the difference between the upper and lower part of the curves in units of standard deviation. Or in other words, it gives a weigthed differences. ##### Curve-repsonse-score (CRS) $$CRS= \begin{cases} \frac{fcScore+vScore+zScore}{3}*100,\\ -0 \quad for \quad Z'_{mod.}<-0.5 \quad or \quad V'_{mod.}<-0.5 +0 \quad for \quad F_{Z}<-0.5 \quad or \quad F_{V}<-0.5 \end{cases}$$ with @@ -213,27 +213,27 @@ $$fcScore= and -$$vScore=V'_{mod.}$$ +$$vScore=F_{V}$$ and $$zScore= \begin{cases} -1 \quad for \quad Z'_{mod.}>0.5\\ -\frac{Z'_{mod.}}{0.5} \quad for \quad 0.5 > Z'_{mod.}>-0.5 +1 \quad for \quad F_{Z}>0.5\\ +\frac{F_{Z}}{0.5} \quad for \quad 0.5 > F_{Z}>-0.5 \end{cases}$$ -The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $V'_{mod.}$ factor being equal to the vScore and the $Z'_{mod.}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $Z'_{mod.}$ factor to the zScore is made due to the common interpretation of the *Z’* factor (Zhang, Chung and Oldenburg 1999). For $Z'_{mod.}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. +The *CRS* combines three measures used to describe the quality of a response curve, the effect size defined as $Log_2FC$ and incorporated in the fcScore, the $F_{V}$ factor being equal to the vScore and the $F_{Z}$ factor used in the definition of the zScore. In the fcScore, the $Log_2FC$ is normalized by and thresholded at $Log_2FC_{max}=2.59$ . The factor is chosen to not overrate features that exhibit substantial changes. The restriction of the $F_{Z}$ factor to the zScore is made due to the common interpretation of the *FZ* factor (Zhang, Chung and Oldenburg 1999). For $F_{Z}>0.5$ a bioassay is said to be excellent, since for $\sigma_l=\sigma_u$ a value of 0.5 is equivalent to a separation of 12 standard deviations between $\mu_u$ and $\mu_l$ . Accordingly, a value of -0.5 is equivalent to a separation of 3 standard deviations between $\mu_u$ and $\mu_l$ for $\sigma_l=\sigma_u$ . The rather moderate lower threshold is in particular of importance for MALDI MS-based bioassay exhibiting a relatively high variance in the data. ### Metrics subtab -The metrics screen enables to visualize different metrics (*Z'*, *V'*, *SSMD*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. +The metrics screen enables to visualize different metrics (*FZ*, *FV*, *FS*, *logFC*, *CRS* as well as pEC50, etc.) as a function of **m/z**. The direction of the peaks (up or down) highlights the direction of regulation (if the intensity of the signal increases or decreases with the concentration). It is therefor useful to get a fast overview of the whole data set. The different metrics concentrate on different aspects of the quality of the curve. ## QC tab The top part of the OC tab focuses on the (potential) peak used for re-calibration and enables the user to inspect the alignment of the (average) spectra per concentration. -The lower left part shows different metrics (both assay quality metrics like *Z'*, *V'*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** +The lower left part shows different metrics (both assay quality metrics like *FZ*, *FV*, *CRS* and MALDI parameters like total ion current as well as re-calibration shifts and PCA loadings) per spot in a target plate view. **This functionality is currently only featured for Bruker raw data. And wont be visible with the `mzML` input file format selected.** The lower right shows processing (and in case of Bruker data also some measurement meta data) as a summary.