Goalign is a set of command line tools to manipulate multiple alignments. It is implemented in Go language.
The goal is to handle multiple alignments in different input and output formats (Fasta, Phylip, Clustal and Nexus) through several basic commands. Each command may print result (usually an alignment) in the standard output, and thus can be piped to the standard input of the next Goalign command.
You can download already compiled binaries for the latest release in the release section. Binaries are available for MacOS, Linux, and Windows (32 and 64 bits).
Once downloaded, you can just run the executable without any other downloads.
In order to compile goalign, you must first download and install Go on your system.
Then you just have to type :
go get github.com/evolbioinfo/goalign/
This will download Goalign sources from github, and all its dependencies.
You can then build it with:
cd $GOPATH/src/github.com/evolbioinfo/goalign/
make
The goalign
executable should be located in the $GOPATH/bin
folder.
Here is the list of all commands, with the link to the full description, and a link to a snippet that does it in GO. Almost all commands can have the following arguments:
-p
: input is in phylip format (default fasta). Output format will also be phylip in this case;-x
: input is in nexus format (default fasta), lower priority than-p
. Output format will also be nexus in this case;-u
: input is in clustan format (default fasta), lower priority than-p
and-x
. Output format will also be clustal in this case;--input-strict
: if-p
is also given, then input is considered phylip strict, i.e:- sequence names are maximum 10 character long. goalign removes spaces in sequence names;
- sequence starts at position 11 (just after sequence name).
--output-strict
: if-p
is also given, then output alignments are written in strict phylip format, i.e:- sequence names are maximum 10 character long, otherwise they are truncated;
--no-block
: if-p
is also given, then output alignments are written in phylip, without 10 character block separation.--one-line
: if-p
is also given, then output alignments are written inphylip, on one single line.--auto-detect
(overrides-p
,-u
and-x
): It will test input formats in the following order:- Fasta
- Nexus
- Clustal
- Phylip
If none of these formats is recognized, then will exit with an error. Please also note that in
--auto-detect
mode, phylip format is considered as not strict.
--alphabet
: Used to specify which alphabet must be used to parse the alignment. It can beauto
(default),aa
, ornt
. By default, the alphabet is deduced from the content of the input file. In the case of nexus format, when--alphabet auto
is specified, the alphabet specified in the nexus file is used. Otherwise, this option overrides the nexus file alphabet.
Output is written to stdout by default, but can be generally written to files with the -o
option. If the given output file has a .gz
or .xz
extension, the output is compressed accordingly.
Command | Subcommand | Description |
---|---|---|
addid (api) | Adds a string to each sequence identifier of the input alignment | |
append (api) | Concatenates several alignments by adding new alignments as new sequences of the first alignment | |
build (api) | Command to build output files : bootstrap for example | |
-- | distboot | Builds bootstrap distances matrices from input alignment (nt only) |
-- | seqboot | Builds bootstrap alignments from input alignment |
clean (api) | Removes gap sites/sequences | |
-- | sites | Removes sequences with gaps |
-- | seqs | Removes sites with gaps |
codonalign (api) | Adds gaps in nt sequences, according to its corresponding protein alignment | |
compress (api) | Removes identical patterns/sites from an input alignment | |
compute (api) | Different computations (distances, entropy, etc.) | |
-- | distance | Computes distance matrix from inpu alignment |
-- | entropy | Computes entropy of sites of a given alignment |
-- | pssm | Computes and prints a Position specific scoring matrix |
concat (api) | Concatenates a set of alignment | |
consensus (api) | Computes a basic majority consensus sequence | |
extract | Extracts sub-sequences from an input alignment | |
completion | Generates auto-completion commands for bash or zsh | |
dedup (api) | Deduplicate/Remove identical sequences | |
diff (api) | Compares all sequences of an alignment to the first one, and counts differences | |
divide (api) | Divide an input alignment in several output files | |
draw (api) | Draws an input alignment | |
-- | biojs | Displays an input alignment in an html file using biojs |
-- | png | Displays an input alignment in a png file |
identical (api) | Tells whether two alignments are identical | |
mask (api) | Mask (with N or X) positions of input alignment | |
mutate (api) | Adds substitutions (~sequencing errors), or gaps, uniformly in an input alignment | |
-- | gaps | Adds gaps uniformly in an input alignment |
-- | snvs | Adds substitutions uniformly in an input alignment |
orf (api) | Find the longest orf in all given sequences in forward strand | |
phase (api) | Find best Starts by aligning to translated ref sequences and set them as new start positions | |
phasent (api) | Find best Starts by aligning to ref sequences and set them as new start positions | |
random (api) | Generate random sequences | |
reformat (api) | Reformats input alignment into phylip of fasta format | |
-- | clustal | Reformats an input alignment into Clustal |
-- | fasta | Reformats an input alignment into Fasta |
-- | nexus | Reformats an input alignment into nexus |
-- | paml | Reformats an input alignment into PAML input format |
-- | phylip | Reformats an input alignment into Phylip |
-- | tnt | Reformats an input alignment into TNT input file |
rename (api) | Rename sequences of the input alignment (using a map file, with a regexp, or just clean names) | |
replace (api) | Replace characters in sequences of input alignment | |
revcomp (api) | Reverse complements an input alignment | |
sample (api) | Samples sequences or sites from an input alignment | |
-- | seqs | Samples a subset of sequences from the input alignment |
-- | sites | Takes a random subalignment |
-- | rarefy | Takes a sample taking into accounts weights |
shuffle (api) | A set of commands to shuffle an alignment | |
-- | recomb | Recombines sequences in the input alignment (copy/paste) |
-- | rogue | Simulates rogue taxa |
-- | seqs | Shuffles sequence order in alignment |
-- | sites | Shuffles n alignment sites vertically |
-- | swap | Swaps portion of sequences in the input alignment (cut/paste) |
split (api) | Split an input alignment according to partitions defined in an partition file | |
sort (api) | Sorts the alignment by sequence name | |
stats (api) | Prints different characteristics of the alignment | |
-- | alleles | Prints the average number of alleles per sites of the alignment |
-- | alphabet | Prints the alphabet detected for the alignment |
-- | char | Prints frequence of different characters (aa/nt) of the alignment |
-- | gaps | Prints statistics about gaps for each sequence of the alignment |
-- | length | Prints the length of sequences in the alignment |
-- | mutations | Prints, for each sequence, the number of mutations compared to a reference sequence |
-- | maxchar | Prints max occurence char for each alignment site |
-- | nalign | Prints the number of alignments in the input file (phylip) |
-- | nseq | Prints the number of sequences in the alignment |
-- | taxa | Prints index (position) and name of taxa of the alignment file |
subseq (api) | Take a sub-alignment from the input alignment | |
subset (api) | Take a subset of sequences from the input alignment | |
subsites (api) | Take a subset of the sites from the input alignment | |
sw (api) | Aligns 2 sequences using Smith&Waterman algorithm | |
tolower (api) | Replace upper case characters by lower case characters | |
toupper (api) | Replace lower case characters by upper case characters | |
translate (api) | Translates an input sequence into Amino-Acids | |
transpose (api) | Transposes an input alignment (sequences<=>sites) | |
trim (api) | This command trims names of sequences or sequences themselves | |
-- | name | Trims names of sequences |
-- | seq | Trims sequences of the input alignment |
unalign (api) | Unaligns input alignment | |
version | Prints the current version of goalign |