Skip to content

Commit

Permalink
[W4Mconcat] First version of a new tool to concatenate W4M metadata t…
Browse files Browse the repository at this point in the history
…ables (#269)

* [W4Mconcat] First version of a new tool to concatenate W4M metadata tables

---------

Co-authored-by: nourine <[email protected]>
Co-authored-by: Björn Grüning <[email protected]>
  • Loading branch information
3 people authored Jul 10, 2024
1 parent 584dea1 commit eba1150
Show file tree
Hide file tree
Showing 16 changed files with 1,059 additions and 0 deletions.
9 changes: 9 additions & 0 deletions tools/w4mconcatenate/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
categories: [Metabolomics]
description: '[W4M][Utils] concatenate two Metadata tables'
homepage_url: http://workflow4metabolomics.org
long_description: 'Part of the W4M project: http://workflow4metabolomics.org / The
R script concatenates two metadata file (sample metadata or
variable metadata) and returns a metadata file and two data matrix files.'
name: w4mconcatenate
owner: workflow4metabolomics
remote_repository_url: https://github.com/workflow4metabolomics/tools-metabolomics
55 changes: 55 additions & 0 deletions tools/w4mconcatenate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# W4M concatenate


Metadata
-----------

* **@name**: W4M concatenate
* **@galaxyID**: W4Mconcatenate
* **@version**: 1.0.0
* **@authors**: Original code: Hanane Nourine (Intern - PFEM - INRAE) - Maintainer: Melanie Petera (PFEM - INRAE - MetaboHUB)
* **@init date**: 2024, May
* **@main usage**: This tool enables the concatenation of two matrices of Metadata and returns a matrix of Metadata and two DataMatrix


Context
-----------

This tool is provided as one of the [Workflow4Metabolomics](http://workflow4metabolomics.org) Galaxy instance data handling tools.
W4M is an international infrastructure providing software tools to process, analyse and annotate metabolomics data.

User interface is based on the Galaxy platform (homepage: https://galaxyproject.org/). It is an open, web-based platform for data intensive biomedical research.
Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.

The tool was created as part of a Master's level internship.

Configuration
-----------

### Requirement:
* R software: version = 4.3.3 recommended
* r-dplyr = 1.1.4
* r-w4mrutils = 1.0.0

Technical description
-----------

Main files:

- concatenation.R: R function (core script)
- fonctions_auxiliaires.R : R auxiliary functions.
- concatenation_wrapper.R: R script to link the main R function to inputs
- concat.xml: XML wrapper (interface for Galaxy)


Services provided
-----------

* Help and support: https://community.france-bioinformatique.fr/c/workflow4metabolomics/10



License
-----------

* Cea Cnrs Inria Logiciel Libre License, version 2.1 (CECILL-2.1)
261 changes: 261 additions & 0 deletions tools/w4mconcatenate/concat.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
<tool id="W4Mconcatenate" name="W4M concatenate" version="1.0.0+galaxy0">
<description>to merge two metadata tables</description>
<requirements>
<requirement type="package" version="4.3.3">r-base</requirement>
<requirement type="package" version="1.1.4">r-dplyr</requirement>
<requirement type="package" version="1.0.0">r-w4mrutils</requirement>
</requirements>
<command detect_errors="exit_code">
<![CDATA[
Rscript '$__tool_directory__/concatenation_wrapper.R'
dataMatrix_1 '$dataMatrix_1'
dataMatrix_2 '$dataMatrix_2'
metadata_1 '$metadata_1'
metadata_2 '$metadata_2'
type '$type'
tab1 '$tab1'
tab2 '$tab2'
concatenation '$concatenation'
choice_keep '$choice.choice_keep'
#if str($choice.choice_keep) == 'yes':
keep 0
#end if
#if str($choice.choice_keep) == 'no':
keep "$choice.keep"
#end if
dataMatrix_1_out '$dataMatrix_1_out'
dataMatrix_2_out '$dataMatrix_2_out'
metadata_out '$metadata_out'
]]></command>


<inputs>

<param name="dataMatrix_1" type="data" label="Data matrix file 1" format="tabular"/>
<param name="dataMatrix_2" type="data" label="Data matrix file 2" format="tabular"/>
<param name="metadata_1" type="data" label="Metadata file 1" format="tabular"/>
<param name="metadata_2" type="data" label="Metadata file 2" format="tabular"/>

<param name="type" type="select" display="radio" label="Type of metadata">
<option value="sample">Sample metadata</option>
<option value="variable">Variable metadata</option>
</param>

<param name="concatenation" type="select" display="radio" label="Type of concatenation">
<option value="unique">Unique</option>
<option value="intersection">Intersection</option>
<option value="union">Union</option>
</param>

<conditional name="choice">
<param name="choice_keep" type="select" display="radio" label="Keep all or just one">
<option value="yes">Keep all</option>
<option value="no">Keep one</option>
</param>
<when value="no">
<param name="keep" type="select" display="radio" label="Which Metadata to keep">
<option value="1">Metadata 1</option>
<option value="2">Metadata 2</option>
</param>
</when>
<when value="yes">
</when>
</conditional>

<param name="tab1" type="text" label="Suffix for Metadata 1" value="Tab1"/>
<param name="tab2" type="text" label="Suffix for Metadata 2" value="Tab2"/>

</inputs>



<outputs>

<data name="dataMatrix_1_out" label="Concat_${dataMatrix_1.name}" format="tabular"></data>
<data name="dataMatrix_2_out" label="Concat_${dataMatrix_2.name}" format="tabular"></data>
<data name="metadata_out" label="Concat_${metadata_1.name}" format="tabular"></data>

</outputs>

<tests>
<test>
<param name="dataMatrix_1" value="Input_Unique_Test_1-2-3-4-5_DM1.txt"/>
<param name="dataMatrix_2" value="Input_Unique_Test_1-2-3_DM2.txt"/>
<param name="metadata_1" value="Input_Unique_Test_1_SM1.txt"/>
<param name="metadata_2" value="Input_Unique_Test_1_SM2.txt"/>
<param name="concatenation" value="unique"/>
<param name="choice_keep" value="yes"/>
<param name="tab1" value="tab1"/>
<param name="tab2" value="tab2"/>

<output name="dataMatrix_1_out" file="Output_Attendu_Unique_Test_1-2-3_DM1.txt"/>
<output name="dataMatrix_2_out" file="Output_Attendu_Unique_Test_1-2-3_DM2.txt"/>
<output name="metadata_out" file="Output_Attendu_Unique_Test_1_Metadata.txt"/>


</test>

<test expect_failure="true">
<param name="dataMatrix_1" value="Input_Unique_Test_1-2-3-4-5_DM1.txt"/>
<param name="dataMatrix_2" value="Input_Unique_Test_4_DM2.txt"/>
<param name="metadata_1" value="Input_Unique_Test_4_SM1.txt"/>
<param name="metadata_2" value="Input_Unique_Test_4_SM2.txt"/>
<param name="concatenation" value="unique"/>
<param name="choice_keep" value="yes"/>
<param name="tab1" value="tab1"/>
<param name="tab2" value="tab2"/>

</test>

</tests>

<help><![CDATA[
.. class:: infomark
**Credits**
| **Original tool code:** Hanane Nourine
| **Original tool wrapping:** Hanane Nourine
| **Tool maintainer:** Melanie Petera (PFEM - INRAE - MetaboHUB)
| **Wrapper maintainer:** Melanie Petera (PFEM - INRAE - MetaboHUB)
.. class:: infomark
**Contact:** Melanie Petera (PFEM - INRAE - MetaboHUB)
---------------------------------------------------
******************
Concatenation
******************
===========
DESCRIPTION
===========
Concatenate two Metadata tables
---------------------------------------------------
==========================
ALIGNMENT WITH OTHER TOOLS
==========================
This tool is designed to work with the W4M 3-tables format. Inputs are supposed to follow this standard.
Outputs follows the standard and thus can be used in further W4M-supported workflows.
-----------
Input files
-----------
+----------------------------------+------------+
| Parameter : num + label | Format |
+==================================+============+
| 1 : First Data matrix file | tabular |
+----------------------------------+------------+
| 2 : First metadata file | tabular |
+----------------------------------+------------+
| 3 : Second Data matrix file | tabular |
+----------------------------------+------------+
| 4 : Second metadata file | tabular |
+----------------------------------+------------+
------------
Output files
------------
+----------------------------------+------------+
| Parameter : num + label | Format |
+==================================+============+
| 1 : Metadata concatenate file | tabular |
+----------------------------------+------------+
| 2 : First Data matrix file | tabular |
+----------------------------------+------------+
| 3 : Second Data matrix file | tabular |
+----------------------------------+------------+
---------------------------------------------------
===============
TOOL PARAMETERS
===============
Data matrix
| contains the intensity values of the analytical variables.
| First line is the sample identifiers and first column the variable identifiers.
|
Metadata file
| contains metadata about samples OR variables (only one type possible at a time).
| Each metadata file must match its corresponding data matrix file regarding identifiers.
|
Type of metadata
| Specify which type of metadata table is given as metadata file
|
Type of concatenation
| Selects the type of concatenation
| Unique: identifiers are the same in both datasets (i.e. metadata file 1 and metadata file 2 are expected to match perfectly regarding identifiers)
| Intersection: keep only identifiers in common
| Union: keep all identifiers in both datasets
|
Choice_keep
| Determine what to do in case a column name is found in the two metadata files
| - Keep all: both columns and change their names by adding a prefix
| - Keep one: the column belonging to metadata 1 or metadata 2
|
Prefixes for redundant columns
| By default, the prefixes are Tab1 and Tab2
|
==================
OUTPUT DESCRIPTION
==================
**Unique**: the tables must have the same identifiers, otherwise a warning will be generated.
The output returned is a unique merged metadata file without modifying the DataMatrices.
**Intersection**: there must be identifiers in common between the two metadata tables, otherwise an error will be displayed.
Moreover, if the number of identifiers in common is less than 5, a warning will be generated.
The DataMatrix outputs will only keep common identifiers.
**Union**: the identifiers don't necessarily have to be the same between the metadata tables. If no identifiers are common, a warning will be generated.
The modified DataMatrix will then include all the individuals, adding NA values for those that are missing.
If common pairs of sample and variable exist between the dataset 1 and dataset 2 with no difference in the intensity values,
the intersections will be added instead of NA.
]]></help>
<citations>
<citation type="doi">10.1093/bioinformatics/btu813</citation>
</citations>

<creator>
<organization name="PFEM" url="https://eng-pfem.isc.inrae.fr/"></organization>
<organization name="Workflow4metabolomics" url="https://workflow4metabolomics.org/"></organization>

<person givenName="Hanane" familyName="Nourine" jobTitle="Intern"></person>
<person givenName="Mélanie" familyName="Petera"></person>
</creator>
</tool>
Loading

0 comments on commit eba1150

Please sign in to comment.