Skip to content

Commit

Permalink
Merge pull request #6397 from B0r1sD/trimal
Browse files Browse the repository at this point in the history
Add trimal
  • Loading branch information
bgruening authored Nov 15, 2024
2 parents fcdb311 + 8bbf4f1 commit b537658
Show file tree
Hide file tree
Showing 9 changed files with 420 additions and 0 deletions.
9 changes: 9 additions & 0 deletions tools/trimal/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
categories:
- Phylogenetics
description: Tool for automated alignment trimming
homepage_url: https://trimal.readthedocs.io
long_description: trimAl is a tool for the automated removal of spurious sequences
or poorly aligned regions from a multiple sequence alignment.
name: trimal
owner: iuc
remote_repository_url: https://github.com/inab/trimal
3 changes: 3 additions & 0 deletions tools/trimal/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# trimAl

[trimAl](https://github.com/inab/trimal), a tool for automated alignment trimming in large-scale phylogenetic analyses.
74 changes: 74 additions & 0 deletions tools/trimal/test-data/custom_trimmed_example.009.AA.html

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions tools/trimal/test-data/custom_trimmed_example.009.AA.phy
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
9 63
Csa004271 --------------YMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
Xtr21234 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
LcaH SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
Hsa167996 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
Mmu024661 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
Dre37936 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
LcaM SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
Tru14292 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH
Ola20972 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH

18 changes: 18 additions & 0 deletions tools/trimal/test-data/example.004.AA.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
>Sp8
FPWNGLQIHMMGIII

>Sp17
FPWNGLQIHMMGIII

>Sp10
FPWNGLQIHMMGIII

>Sp26
FPWNGLQIHMMGIII

>Sp33
FPWNGLQIHMMGIII

>Sp6
FPWNGLQIHMMGIII

45 changes: 45 additions & 0 deletions tools/trimal/test-data/example.009.AA.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
>Csa004271
---------------------------------MYMAMGHFFDRDDVALKNISEYFKECS
EEEREHANKMIEFHNKRGGTTTYFPIKAPGSFDPANFNTIKAMNCALALEVNVNKSLLAL
HE--TANGDPEFQDFIEANFLHEQVDAIKKLKDYITNLKLVG---TGLGEFLFDKHFKSS
-----
>Xtr21234
----MISQVRQNYSHDCEAAVNRMVNLEMYASYTYLSMSHYFDRDDVALHHVAEFFKEQS
KEERECAEKLMKCQNKRGGRIVLQDIKKPERDEWG--STLDAMQTALDLEKHVNQALLDL
HNLATERKDPHICDFLESEHLDEQVKHMKKFGDHITNLKRLGVPQNGMGEYLFDKHSLS-
-----
>LcaH
----MSSQVRQNFHQDCEAAINRQINLELYASYVYLSMAYYFDRDDQALHNFAKFFRHQS
HEEREHAEKLMKLQNQRGGRIFLQDVRKPDRDEWG--SGVEALECALQLEKSVNQSLLDL
HKLCSDHNDPHLCDFIETHYLDEQVKSIKELADWVTNLRRMGAPQNGMAEYLFDKHTLGK
ES--S
>Hsa167996
MTTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFLHQS
HEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWE--SGLNAMECALHLEKNVNQSLLEL
HKLATDKNDPHLCDFIETHYLNEQVKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGD
SDNES
>Mmu024661
MTTASPSQVRQNYHQDAEAAINRQINLELYASYVYLSMSCYFDRDDVALKNFAKYFLHQS
HEEREHAEKLMKLQNQRGGRIFLQDIKKPDRDDWE--SGLNAMECALHLEKSVNQSLLEL
HKLATDKNDPHLCDFIETYYLSEQVKSIKELGDHVTNLRKMGAPEAGMAEYLFDKHTLGH
GD-ES
>Dre37936
---METSQIRQNYVRDCEAAINKMINLELYAGYTYTSMAHYFKRDDVALPGFAKFFKKNS
EEEREHAEKFMEFQNKRGGRIVLQDIKKPDRDVWG--NGLIAMQCALQLEKNVNQALLDL
HKLATEMGDPHLCDFLETHYLNEQVEAIKKLGDHITNLSKMDAGNNRMAEYLFDKHTLDS
-----
>LcaM
----MESQVRQNYHRDCEAAVNRMVNMEMFASYTYTSMAFYFSRDDVALPGFSHFFKENS
DEEREHAEKLLSFQNKRGGHIFLQDIKKPERDEWG--SGLEAMQCALQLKKNVNQALLDL
HKLASDHGDPHLCDFLETHYLNEQVEAIKKLGDYISNLSRMDAQKNKMAEYLFDKHSLGG
KS---
>Tru14292
----MESQVRQNYHRDCEAAINKMINMELYASYTYTSMAFFFSRDDVALPGFAHFFKENS
DEEREHAEKLLSFQNKRGGRIFLQDIKKPERDEWG--SGLEAMQCALQLEKKVNQALLDL
HKLASDHVDPHLCDFLESHYLNEQVEAIKKLGDYITNLSRMDAQNNKMAEYLFDKHTLGS
KS---
>Ola20972
----MESQVRQNYHRDCEAAINRMVNMELFASYTYTSMAFYFDRDDVALPGFSHFFKENS
HEEKEHADKLLSFQNKRGGRIFLQDVKKPERDEWG--SGLEAMQCALQLEKNVNQALLDL
HKVASDHKDPHMCDFLETHYLNEQVESIKKIGDHITNLTRMDAHTNKMAEYLFDKHTLGS
KS---
71 changes: 71 additions & 0 deletions tools/trimal/test-data/trimmed_example.009.AA.html

Large diffs are not rendered by default.

58 changes: 58 additions & 0 deletions tools/trimal/test-data/trimmed_example.009.AA.mega
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#MEGA
!Title ./test-data/example.009.AA.fasta;
!Format DataType=protein NSeqs=9 Nsites=174 indel=- CodeTable=Standard;

#Csa004271
---------- ---------- ---------M YMAMGHFFDR DDVALKNISE
YFKECSEEER EHANKMIEFH NKRGGTTTYF PIKAPGSFDP ANTIKAMNCA
LALEVNVNKS LLALHE--TA NGDPEFQDFI EANFLHEQVD AIKKLKDYIT
NLKLVG---T GLGEFLFDKH FKSS

#Xtr21234
MISQVRQNYS HDCEAAVNRM VNLEMYASYT YLSMSHYFDR DDVALHHVAE
FFKEQSKEER ECAEKLMKCQ NKRGGRIVLQ DIKKPERDEW GSTLDAMQTA
LDLEKHVNQA LLDLHNLATE RKDPHICDFL ESEHLDEQVK HMKKFGDHIT
NLKRLGVPQN GMGEYLFDKH SLS-

#LcaH
MSSQVRQNFH QDCEAAINRQ INLELYASYV YLSMAYYFDR DDQALHNFAK
FFRHQSHEER EHAEKLMKLQ NQRGGRIFLQ DVRKPDRDEW GSGVEALECA
LQLEKSVNQS LLDLHKLCSD HNDPHLCDFI ETHYLDEQVK SIKELADWVT
NLRRMGAPQN GMAEYLFDKH TLGK

#Hsa167996
STSQVRQNYH QDSEAAINRQ INLELYASYV YLSMSYYFDR DDVALKNFAK
YFLHQSHEER EHAEKLMKLQ NQRGGRIFLQ DIKKPDCDDW ESGLNAMECA
LHLEKNVNQS LLELHKLATD KNDPHLCDFI ETHYLNEQVK AIKELGDHVT
NLRKMGAPES GLAEYLFDKH TLGD

#Mmu024661
SPSQVRQNYH QDAEAAINRQ INLELYASYV YLSMSCYFDR DDVALKNFAK
YFLHQSHEER EHAEKLMKLQ NQRGGRIFLQ DIKKPDRDDW ESGLNAMECA
LHLEKSVNQS LLELHKLATD KNDPHLCDFI ETYYLSEQVK SIKELGDHVT
NLRKMGAPEA GMAEYLFDKH TLGH

#Dre37936
ETSQIRQNYV RDCEAAINKM INLELYAGYT YTSMAHYFKR DDVALPGFAK
FFKKNSEEER EHAEKFMEFQ NKRGGRIVLQ DIKKPDRDVW GNGLIAMQCA
LQLEKNVNQA LLDLHKLATE MGDPHLCDFL ETHYLNEQVE AIKKLGDHIT
NLSKMDAGNN RMAEYLFDKH TLDS

#LcaM
MESQVRQNYH RDCEAAVNRM VNMEMFASYT YTSMAFYFSR DDVALPGFSH
FFKENSDEER EHAEKLLSFQ NKRGGHIFLQ DIKKPERDEW GSGLEAMQCA
LQLKKNVNQA LLDLHKLASD HGDPHLCDFL ETHYLNEQVE AIKKLGDYIS
NLSRMDAQKN KMAEYLFDKH SLGG

#Tru14292
MESQVRQNYH RDCEAAINKM INMELYASYT YTSMAFFFSR DDVALPGFAH
FFKENSDEER EHAEKLLSFQ NKRGGRIFLQ DIKKPERDEW GSGLEAMQCA
LQLEKKVNQA LLDLHKLASD HVDPHLCDFL ESHYLNEQVE AIKKLGDYIT
NLSRMDAQNN KMAEYLFDKH TLGS

#Ola20972
MESQVRQNYH RDCEAAINRM VNMELFASYT YTSMAFYFDR DDVALPGFSH
FFKENSHEEK EHADKLLSFQ NKRGGRIFLQ DVKKPERDEW GSGLEAMQCA
LQLEKNVNQA LLDLHKVASD HKDPHMCDFL ETHYLNEQVE SIKKIGDHIT
NLTRMDAHTN KMAEYLFDKH TLGS

131 changes: 131 additions & 0 deletions tools/trimal/trimal.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
<tool id="trimal" name="trimAl" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.0">
<description>for automated alignment trimming</description>
<macros>
<token name="@TOOL_VERSION@">1.5</token>
<token name="@VERSION_SUFFIX@">0</token>
</macros>
<xrefs>
<xref type="bio.tools">trimal</xref>
</xrefs>
<requirements>
<requirement type="package" version="@TOOL_VERSION@">trimal</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
trimal -in '$in' -out '$trimmed_output' -htmlout '$html_summary' ${out_format_selector}
#if $trimming_mode.mode_selector == "custom"
-gapthreshold $trimming_mode.gapthreshold
-simthreshold $trimming_mode.simthreshold
-cons $trimming_mode.cons
#else:
$trimming_mode.mode_selector
#end if
]]></command>
<inputs>
<param argument="-in" type="data" format="fasta,clustal,pir,phylip,nexus,mega" label="Alignment file (clustal, fasta, NBRF/PIR, nexus, phylip3.2, phylip)" />
<conditional name="trimming_mode">
<param name="mode_selector" type="select" label="Select trimming mode from the list">
<option value="-nogaps">nogaps - remove all positions with gaps in the alignment.</option>
<option value="-noallgaps">noallgaps - remove columns composed only by gaps.</option>
<option value="-gappyout">gappyout - only uses information based on gaps' distribution.</option>
<option value="-strict">strict - combine gappyout trimming with subsequent trimming based on an automatically selected similarity threshold. </option>
<option value="-strictplus">strictplus - very similar to the strict method but the final step of the algorithm is slightly different. </option>
<option value="-automated1">automated1 - heuristic approach to determine the optimal automatic method for trimming a given alignment. </option>
<option value="custom">custom mode - eliminates a specified set of columns defined by the user.</option>
</param>
<when value="-nogaps" />
<when value="-noallgaps"/>
<when value="-gappyout"/>
<when value="-strict"/>
<when value="-strictplus"/>
<when value="-automated1"/>
<when value="custom">
<param argument="-gapthreshold" type="float" optional="true" value="0.9" min="0.0" max="1.0" label="Gap threshold" help="1 - (fraction of sequences with a gap allowed)."/>
<param argument="-simthreshold" type="float" optional="true" value="0.9" min="0.0" max="1.0" label="Similarity threshold" help="Minimum average similarity allowed."/>
<param argument="-cons" type="integer" optional="true" value="50" min="0" max="100" label="Minimum conservance percentage" help="Minimum percentage of the positions in the original alignment to conserve."/>
</when>
</conditional>
<param name="out_format_selector" type="select" label="Select trimmed alignment output format from the list">
<option value="-clustal">CLUSTAL format</option>
<option value="-fasta">FASTA format</option>
<option value="-fasta_m10">FASTA format. Sequences name length up to 10 characters.</option>
<option value="-nbrf">NBRF/PIR format</option>
<option value="-nexus">NEXUS format</option>
<option value="-mega">MEGA format</option>
<option value="-phylip">PHYLIP/PHYLIP4 format</option>
<option value="-phylip_m10">PHYLIP/PHYLIP4 format. Sequences name length up to 10 characters</option>
<option value="-phylip_paml">PHYLIP format compatible with PAML</option>
<option value="-phylip_paml_m10">PHYLIP format compatible with PAML. Sequences name length up to 10 characters.</option>
<option value="-phylip3.2">PHYLIP3.2 format</option>
<option value="-phylip3.2_m10">PHYLIP3.2 format. Sequences name length up to 10 characters.</option>
</param>
</inputs>
<outputs>
<data name="trimmed_output" format="fasta" label="Trimmed alignment.">
<change_format>
<when input="out_format_selector" value="-fasta" format="fasta" />
<when input="out_format_selector" value="-fasta_m10" format="fasta" />
<when input="out_format_selector" value="-phylip" format="phylip" />
<when input="out_format_selector" value="-phylip_m10" format="phylip" />
<when input="out_format_selector" value="-phylip_paml" format="phylip" />
<when input="out_format_selector" value="-phylip_paml_m10" format="phylip" />
<when input="out_format_selector" value="-phylip3.2" format="phylip" />
<when input="out_format_selector" value="-phylip3.2_m10" format="phylip" />
<when input="out_format_selector" value="-clustal" format="clustal" />
<when input="out_format_selector" value="-mega" format="mega" />
<when input="out_format_selector" value="-nbrf" format="pir" />
<when input="out_format_selector" value="-nexus" format="nexus" />
</change_format>
</data>
<data name="html_summary" format="html" label="trimal html summary."/>
</outputs>
<tests>
<test expect_num_outputs="2">
<param name="in" value="example.009.AA.fasta"/>
<param name="mode_selector" value="-gappyout" />
<param name="out_format_selector" value="-mega" />
<output name="trimmed_output" file="trimmed_example.009.AA.mega" lines_diff="2"/>
<output name="html_summary" file="trimmed_example.009.AA.html"/>
</test>
<test expect_num_outputs="2">
<param name="in" value="example.009.AA.fasta"/>
<param name="mode_selector" value="custom" />
<param name="gapthreshold" value="0.5" />
<param name="simthreshold" value="0.5" />
<param name="cons" value="5" />
<param name="out_format_selector" value="-phylip_paml_m10" />
<output name="trimmed_output" file="custom_trimmed_example.009.AA.phy" ftype="phylip"/>
<output name="html_summary" file="custom_trimmed_example.009.AA.html"/>
</test>
</tests>
<help><![CDATA[
TrimAl is a tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment.
TrimAl can consider several parameters, alone or in multiple combinations, in order to select the most-reliable positions in the alignment.
These include the proportion of sequences with a gap, the level of residue similarity and, if several alignments for the same set of sequences are provided,
the consistency level of columns among alignments. Moreover, trimAl allows to manually select a set of columns and sequences to be removed from the alignment.
TrimAl implements a series of automated algorithms that trim the alignment searching for optimum thresholds based on inherent characteristics
of the input alignment, to be used so that the signal-to-noise ratio after alignment trimming phase is increased. Learn more about the available trimming modes here: https://trimal.readthedocs.io/en/latest/algorithms.html
Among trimAl's additional features, trimAl allows:
- getting the complementary alignment (columns that were trimmed),
- to compute statistics from the alignment,
- to select the output file format,
- to get a summary of trimAl's trimming in HTML and SVG formats,
- and many other options.
TrimAl webpage: https://trimal.readthedocs.io
License
-------
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, the last available version.
]]></help>
<citations>
<citation type="doi">doi:10.1093/bioinformatics/btp348</citation>
</citations>
</tool>

0 comments on commit b537658

Please sign in to comment.