-
Notifications
You must be signed in to change notification settings - Fork 440
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6397 from B0r1sD/trimal
Add trimal
- Loading branch information
Showing
9 changed files
with
420 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
categories: | ||
- Phylogenetics | ||
description: Tool for automated alignment trimming | ||
homepage_url: https://trimal.readthedocs.io | ||
long_description: trimAl is a tool for the automated removal of spurious sequences | ||
or poorly aligned regions from a multiple sequence alignment. | ||
name: trimal | ||
owner: iuc | ||
remote_repository_url: https://github.com/inab/trimal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# trimAl | ||
|
||
[trimAl](https://github.com/inab/trimal), a tool for automated alignment trimming in large-scale phylogenetic analyses. |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
9 63 | ||
Csa004271 --------------YMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
Xtr21234 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
LcaH SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
Hsa167996 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
Mmu024661 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
Dre37936 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
LcaM SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
Tru14292 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
Ola20972 SQRQNDEAANNEAYYMFRDDALFSEEEAKNRGGPAALLVNLLLHDPDFELEQVKDNLELFDKH | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
>Sp8 | ||
FPWNGLQIHMMGIII | ||
|
||
>Sp17 | ||
FPWNGLQIHMMGIII | ||
|
||
>Sp10 | ||
FPWNGLQIHMMGIII | ||
|
||
>Sp26 | ||
FPWNGLQIHMMGIII | ||
|
||
>Sp33 | ||
FPWNGLQIHMMGIII | ||
|
||
>Sp6 | ||
FPWNGLQIHMMGIII | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
>Csa004271 | ||
---------------------------------MYMAMGHFFDRDDVALKNISEYFKECS | ||
EEEREHANKMIEFHNKRGGTTTYFPIKAPGSFDPANFNTIKAMNCALALEVNVNKSLLAL | ||
HE--TANGDPEFQDFIEANFLHEQVDAIKKLKDYITNLKLVG---TGLGEFLFDKHFKSS | ||
----- | ||
>Xtr21234 | ||
----MISQVRQNYSHDCEAAVNRMVNLEMYASYTYLSMSHYFDRDDVALHHVAEFFKEQS | ||
KEERECAEKLMKCQNKRGGRIVLQDIKKPERDEWG--STLDAMQTALDLEKHVNQALLDL | ||
HNLATERKDPHICDFLESEHLDEQVKHMKKFGDHITNLKRLGVPQNGMGEYLFDKHSLS- | ||
----- | ||
>LcaH | ||
----MSSQVRQNFHQDCEAAINRQINLELYASYVYLSMAYYFDRDDQALHNFAKFFRHQS | ||
HEEREHAEKLMKLQNQRGGRIFLQDVRKPDRDEWG--SGVEALECALQLEKSVNQSLLDL | ||
HKLCSDHNDPHLCDFIETHYLDEQVKSIKELADWVTNLRRMGAPQNGMAEYLFDKHTLGK | ||
ES--S | ||
>Hsa167996 | ||
MTTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFLHQS | ||
HEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWE--SGLNAMECALHLEKNVNQSLLEL | ||
HKLATDKNDPHLCDFIETHYLNEQVKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGD | ||
SDNES | ||
>Mmu024661 | ||
MTTASPSQVRQNYHQDAEAAINRQINLELYASYVYLSMSCYFDRDDVALKNFAKYFLHQS | ||
HEEREHAEKLMKLQNQRGGRIFLQDIKKPDRDDWE--SGLNAMECALHLEKSVNQSLLEL | ||
HKLATDKNDPHLCDFIETYYLSEQVKSIKELGDHVTNLRKMGAPEAGMAEYLFDKHTLGH | ||
GD-ES | ||
>Dre37936 | ||
---METSQIRQNYVRDCEAAINKMINLELYAGYTYTSMAHYFKRDDVALPGFAKFFKKNS | ||
EEEREHAEKFMEFQNKRGGRIVLQDIKKPDRDVWG--NGLIAMQCALQLEKNVNQALLDL | ||
HKLATEMGDPHLCDFLETHYLNEQVEAIKKLGDHITNLSKMDAGNNRMAEYLFDKHTLDS | ||
----- | ||
>LcaM | ||
----MESQVRQNYHRDCEAAVNRMVNMEMFASYTYTSMAFYFSRDDVALPGFSHFFKENS | ||
DEEREHAEKLLSFQNKRGGHIFLQDIKKPERDEWG--SGLEAMQCALQLKKNVNQALLDL | ||
HKLASDHGDPHLCDFLETHYLNEQVEAIKKLGDYISNLSRMDAQKNKMAEYLFDKHSLGG | ||
KS--- | ||
>Tru14292 | ||
----MESQVRQNYHRDCEAAINKMINMELYASYTYTSMAFFFSRDDVALPGFAHFFKENS | ||
DEEREHAEKLLSFQNKRGGRIFLQDIKKPERDEWG--SGLEAMQCALQLEKKVNQALLDL | ||
HKLASDHVDPHLCDFLESHYLNEQVEAIKKLGDYITNLSRMDAQNNKMAEYLFDKHTLGS | ||
KS--- | ||
>Ola20972 | ||
----MESQVRQNYHRDCEAAINRMVNMELFASYTYTSMAFYFDRDDVALPGFSHFFKENS | ||
HEEKEHADKLLSFQNKRGGRIFLQDVKKPERDEWG--SGLEAMQCALQLEKNVNQALLDL | ||
HKVASDHKDPHMCDFLETHYLNEQVESIKKIGDHITNLTRMDAHTNKMAEYLFDKHTLGS | ||
KS--- |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
#MEGA | ||
!Title ./test-data/example.009.AA.fasta; | ||
!Format DataType=protein NSeqs=9 Nsites=174 indel=- CodeTable=Standard; | ||
|
||
#Csa004271 | ||
---------- ---------- ---------M YMAMGHFFDR DDVALKNISE | ||
YFKECSEEER EHANKMIEFH NKRGGTTTYF PIKAPGSFDP ANTIKAMNCA | ||
LALEVNVNKS LLALHE--TA NGDPEFQDFI EANFLHEQVD AIKKLKDYIT | ||
NLKLVG---T GLGEFLFDKH FKSS | ||
|
||
#Xtr21234 | ||
MISQVRQNYS HDCEAAVNRM VNLEMYASYT YLSMSHYFDR DDVALHHVAE | ||
FFKEQSKEER ECAEKLMKCQ NKRGGRIVLQ DIKKPERDEW GSTLDAMQTA | ||
LDLEKHVNQA LLDLHNLATE RKDPHICDFL ESEHLDEQVK HMKKFGDHIT | ||
NLKRLGVPQN GMGEYLFDKH SLS- | ||
|
||
#LcaH | ||
MSSQVRQNFH QDCEAAINRQ INLELYASYV YLSMAYYFDR DDQALHNFAK | ||
FFRHQSHEER EHAEKLMKLQ NQRGGRIFLQ DVRKPDRDEW GSGVEALECA | ||
LQLEKSVNQS LLDLHKLCSD HNDPHLCDFI ETHYLDEQVK SIKELADWVT | ||
NLRRMGAPQN GMAEYLFDKH TLGK | ||
|
||
#Hsa167996 | ||
STSQVRQNYH QDSEAAINRQ INLELYASYV YLSMSYYFDR DDVALKNFAK | ||
YFLHQSHEER EHAEKLMKLQ NQRGGRIFLQ DIKKPDCDDW ESGLNAMECA | ||
LHLEKNVNQS LLELHKLATD KNDPHLCDFI ETHYLNEQVK AIKELGDHVT | ||
NLRKMGAPES GLAEYLFDKH TLGD | ||
|
||
#Mmu024661 | ||
SPSQVRQNYH QDAEAAINRQ INLELYASYV YLSMSCYFDR DDVALKNFAK | ||
YFLHQSHEER EHAEKLMKLQ NQRGGRIFLQ DIKKPDRDDW ESGLNAMECA | ||
LHLEKSVNQS LLELHKLATD KNDPHLCDFI ETYYLSEQVK SIKELGDHVT | ||
NLRKMGAPEA GMAEYLFDKH TLGH | ||
|
||
#Dre37936 | ||
ETSQIRQNYV RDCEAAINKM INLELYAGYT YTSMAHYFKR DDVALPGFAK | ||
FFKKNSEEER EHAEKFMEFQ NKRGGRIVLQ DIKKPDRDVW GNGLIAMQCA | ||
LQLEKNVNQA LLDLHKLATE MGDPHLCDFL ETHYLNEQVE AIKKLGDHIT | ||
NLSKMDAGNN RMAEYLFDKH TLDS | ||
|
||
#LcaM | ||
MESQVRQNYH RDCEAAVNRM VNMEMFASYT YTSMAFYFSR DDVALPGFSH | ||
FFKENSDEER EHAEKLLSFQ NKRGGHIFLQ DIKKPERDEW GSGLEAMQCA | ||
LQLKKNVNQA LLDLHKLASD HGDPHLCDFL ETHYLNEQVE AIKKLGDYIS | ||
NLSRMDAQKN KMAEYLFDKH SLGG | ||
|
||
#Tru14292 | ||
MESQVRQNYH RDCEAAINKM INMELYASYT YTSMAFFFSR DDVALPGFAH | ||
FFKENSDEER EHAEKLLSFQ NKRGGRIFLQ DIKKPERDEW GSGLEAMQCA | ||
LQLEKKVNQA LLDLHKLASD HVDPHLCDFL ESHYLNEQVE AIKKLGDYIT | ||
NLSRMDAQNN KMAEYLFDKH TLGS | ||
|
||
#Ola20972 | ||
MESQVRQNYH RDCEAAINRM VNMELFASYT YTSMAFYFDR DDVALPGFSH | ||
FFKENSHEEK EHADKLLSFQ NKRGGRIFLQ DVKKPERDEW GSGLEAMQCA | ||
LQLEKNVNQA LLDLHKVASD HKDPHMCDFL ETHYLNEQVE SIKKIGDHIT | ||
NLTRMDAHTN KMAEYLFDKH TLGS | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
<tool id="trimal" name="trimAl" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.0"> | ||
<description>for automated alignment trimming</description> | ||
<macros> | ||
<token name="@TOOL_VERSION@">1.5</token> | ||
<token name="@VERSION_SUFFIX@">0</token> | ||
</macros> | ||
<xrefs> | ||
<xref type="bio.tools">trimal</xref> | ||
</xrefs> | ||
<requirements> | ||
<requirement type="package" version="@TOOL_VERSION@">trimal</requirement> | ||
</requirements> | ||
<command detect_errors="exit_code"><![CDATA[ | ||
trimal -in '$in' -out '$trimmed_output' -htmlout '$html_summary' ${out_format_selector} | ||
#if $trimming_mode.mode_selector == "custom" | ||
-gapthreshold $trimming_mode.gapthreshold | ||
-simthreshold $trimming_mode.simthreshold | ||
-cons $trimming_mode.cons | ||
#else: | ||
$trimming_mode.mode_selector | ||
#end if | ||
]]></command> | ||
<inputs> | ||
<param argument="-in" type="data" format="fasta,clustal,pir,phylip,nexus,mega" label="Alignment file (clustal, fasta, NBRF/PIR, nexus, phylip3.2, phylip)" /> | ||
<conditional name="trimming_mode"> | ||
<param name="mode_selector" type="select" label="Select trimming mode from the list"> | ||
<option value="-nogaps">nogaps - remove all positions with gaps in the alignment.</option> | ||
<option value="-noallgaps">noallgaps - remove columns composed only by gaps.</option> | ||
<option value="-gappyout">gappyout - only uses information based on gaps' distribution.</option> | ||
<option value="-strict">strict - combine gappyout trimming with subsequent trimming based on an automatically selected similarity threshold. </option> | ||
<option value="-strictplus">strictplus - very similar to the strict method but the final step of the algorithm is slightly different. </option> | ||
<option value="-automated1">automated1 - heuristic approach to determine the optimal automatic method for trimming a given alignment. </option> | ||
<option value="custom">custom mode - eliminates a specified set of columns defined by the user.</option> | ||
</param> | ||
<when value="-nogaps" /> | ||
<when value="-noallgaps"/> | ||
<when value="-gappyout"/> | ||
<when value="-strict"/> | ||
<when value="-strictplus"/> | ||
<when value="-automated1"/> | ||
<when value="custom"> | ||
<param argument="-gapthreshold" type="float" optional="true" value="0.9" min="0.0" max="1.0" label="Gap threshold" help="1 - (fraction of sequences with a gap allowed)."/> | ||
<param argument="-simthreshold" type="float" optional="true" value="0.9" min="0.0" max="1.0" label="Similarity threshold" help="Minimum average similarity allowed."/> | ||
<param argument="-cons" type="integer" optional="true" value="50" min="0" max="100" label="Minimum conservance percentage" help="Minimum percentage of the positions in the original alignment to conserve."/> | ||
</when> | ||
</conditional> | ||
<param name="out_format_selector" type="select" label="Select trimmed alignment output format from the list"> | ||
<option value="-clustal">CLUSTAL format</option> | ||
<option value="-fasta">FASTA format</option> | ||
<option value="-fasta_m10">FASTA format. Sequences name length up to 10 characters.</option> | ||
<option value="-nbrf">NBRF/PIR format</option> | ||
<option value="-nexus">NEXUS format</option> | ||
<option value="-mega">MEGA format</option> | ||
<option value="-phylip">PHYLIP/PHYLIP4 format</option> | ||
<option value="-phylip_m10">PHYLIP/PHYLIP4 format. Sequences name length up to 10 characters</option> | ||
<option value="-phylip_paml">PHYLIP format compatible with PAML</option> | ||
<option value="-phylip_paml_m10">PHYLIP format compatible with PAML. Sequences name length up to 10 characters.</option> | ||
<option value="-phylip3.2">PHYLIP3.2 format</option> | ||
<option value="-phylip3.2_m10">PHYLIP3.2 format. Sequences name length up to 10 characters.</option> | ||
</param> | ||
</inputs> | ||
<outputs> | ||
<data name="trimmed_output" format="fasta" label="Trimmed alignment."> | ||
<change_format> | ||
<when input="out_format_selector" value="-fasta" format="fasta" /> | ||
<when input="out_format_selector" value="-fasta_m10" format="fasta" /> | ||
<when input="out_format_selector" value="-phylip" format="phylip" /> | ||
<when input="out_format_selector" value="-phylip_m10" format="phylip" /> | ||
<when input="out_format_selector" value="-phylip_paml" format="phylip" /> | ||
<when input="out_format_selector" value="-phylip_paml_m10" format="phylip" /> | ||
<when input="out_format_selector" value="-phylip3.2" format="phylip" /> | ||
<when input="out_format_selector" value="-phylip3.2_m10" format="phylip" /> | ||
<when input="out_format_selector" value="-clustal" format="clustal" /> | ||
<when input="out_format_selector" value="-mega" format="mega" /> | ||
<when input="out_format_selector" value="-nbrf" format="pir" /> | ||
<when input="out_format_selector" value="-nexus" format="nexus" /> | ||
</change_format> | ||
</data> | ||
<data name="html_summary" format="html" label="trimal html summary."/> | ||
</outputs> | ||
<tests> | ||
<test expect_num_outputs="2"> | ||
<param name="in" value="example.009.AA.fasta"/> | ||
<param name="mode_selector" value="-gappyout" /> | ||
<param name="out_format_selector" value="-mega" /> | ||
<output name="trimmed_output" file="trimmed_example.009.AA.mega" lines_diff="2"/> | ||
<output name="html_summary" file="trimmed_example.009.AA.html"/> | ||
</test> | ||
<test expect_num_outputs="2"> | ||
<param name="in" value="example.009.AA.fasta"/> | ||
<param name="mode_selector" value="custom" /> | ||
<param name="gapthreshold" value="0.5" /> | ||
<param name="simthreshold" value="0.5" /> | ||
<param name="cons" value="5" /> | ||
<param name="out_format_selector" value="-phylip_paml_m10" /> | ||
<output name="trimmed_output" file="custom_trimmed_example.009.AA.phy" ftype="phylip"/> | ||
<output name="html_summary" file="custom_trimmed_example.009.AA.html"/> | ||
</test> | ||
</tests> | ||
<help><![CDATA[ | ||
TrimAl is a tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment. | ||
TrimAl can consider several parameters, alone or in multiple combinations, in order to select the most-reliable positions in the alignment. | ||
These include the proportion of sequences with a gap, the level of residue similarity and, if several alignments for the same set of sequences are provided, | ||
the consistency level of columns among alignments. Moreover, trimAl allows to manually select a set of columns and sequences to be removed from the alignment. | ||
TrimAl implements a series of automated algorithms that trim the alignment searching for optimum thresholds based on inherent characteristics | ||
of the input alignment, to be used so that the signal-to-noise ratio after alignment trimming phase is increased. Learn more about the available trimming modes here: https://trimal.readthedocs.io/en/latest/algorithms.html | ||
Among trimAl's additional features, trimAl allows: | ||
- getting the complementary alignment (columns that were trimmed), | ||
- to compute statistics from the alignment, | ||
- to select the output file format, | ||
- to get a summary of trimAl's trimming in HTML and SVG formats, | ||
- and many other options. | ||
TrimAl webpage: https://trimal.readthedocs.io | ||
License | ||
------- | ||
This program is free software: you can redistribute it and/or modify | ||
it under the terms of the GNU General Public License as published by | ||
the Free Software Foundation, the last available version. | ||
]]></help> | ||
<citations> | ||
<citation type="doi">doi:10.1093/bioinformatics/btp348</citation> | ||
</citations> | ||
</tool> |