Goalign: toolkit and api for alignment manipulation

Github repository

Introduction

Goalign is a set of command line tools to manipulate multiple alignments. It is implemented in Go language.

The goal is to handle multiple alignments in different input and output formats (Fasta, Phylip, Clustal and Nexus) through several basic commands. Each command may print result (usually an alignment) in the standard output, and thus can be piped to the standard input of the next Goalign command.

Installation

Binaries

You can download already compiled binaries for the latest release in the release section. Binaries are available for MacOS, Linux, and Windows (32 and 64 bits).

Once downloaded, you can just run the executable without any other downloads.

From sources

In order to compile goalign, you must first download and install Go on your system.

Then you just have to type :

go get github.com/evolbioinfo/goalign/

This will download Goalign sources from github, and all its dependencies.

You can then build it with:

cd $GOPATH/src/github.com/evolbioinfo/goalign/
make

The goalign executable should be located in the $GOPATH/bin folder.

Commands

Here is the list of all commands, with the link to the full description, and a link to a snippet that does it in GO. Almost all commands can have the following arguments:

-p: input is in phylip format (default fasta). Output format will also be phylip in this case;
-x: input is in nexus format (default fasta), lower priority than -p. Output format will also be nexus in this case;
-u: input is in clustan format (default fasta), lower priority than -p and -x. Output format will also be clustal in this case;
--input-strict: if -p is also given, then input is considered phylip strict, i.e:
- sequence names are maximum 10 character long. goalign removes spaces in sequence names;
- sequence starts at position 11 (just after sequence name).
--output-strict: if -p is also given, then output alignments are written in strict phylip format, i.e:
- sequence names are maximum 10 character long, otherwise they are truncated;
--no-block: if -pis also given, then output alignments are written in phylip, without 10 character block separation.
--one-line: if -pis also given, then output alignments are written inphylip, on one single line.
--auto-detect (overrides -p, -u and -x): It will test input formats in the following order:
1. Fasta
2. Nexus
3. Clustal
4. Phylip If none of these formats is recognized, then will exit with an error. Please also note that in --auto-detect mode, phylip format is considered as not strict.
--alphabet: Used to specify which alphabet must be used to parse the alignment. It can be auto (default), aa, or nt. By default, the alphabet is deduced from the content of the input file. In the case of nexus format, when --alphabet auto is specified, the alphabet specified in the nexus file is used. Otherwise, this option overrides the nexus file alphabet.

Output is written to stdout by default, but can be generally written to files with the -o option. If the given output file has a .gz or .xz extension, the output is compressed accordingly.

Command	Subcommand	Description
addid (api)		Adds a string to each sequence identifier of the input alignment
append (api)		Concatenates several alignments by adding new alignments as new sequences of the first alignment
build (api)		Command to build output files : bootstrap for example
--	distboot	Builds bootstrap distances matrices from input alignment (nt only)
--	seqboot	Builds bootstrap alignments from input alignment
clean (api)		Removes gap sites/sequences
--	sites	Removes sequences with gaps
--	seqs	Removes sites with gaps
codonalign (api)		Adds gaps in nt sequences, according to its corresponding protein alignment
compress (api)		Removes identical patterns/sites from an input alignment
compute (api)		Different computations (distances, entropy, etc.)
--	distance	Computes distance matrix from inpu alignment
--	entropy	Computes entropy of sites of a given alignment
--	pssm	Computes and prints a Position specific scoring matrix
concat (api)		Concatenates a set of alignment
consensus (api)		Computes a basic majority consensus sequence
extract		Extracts sub-sequences from an input alignment
completion		Generates auto-completion commands for bash or zsh
dedup (api)		Deduplicate/Remove identical sequences
diff (api)		Compares all sequences of an alignment to the first one, and counts differences
divide (api)		Divide an input alignment in several output files
draw (api)		Draws an input alignment
--	biojs	Displays an input alignment in an html file using biojs
--	png	Displays an input alignment in a png file
identical (api)		Tells whether two alignments are identical
mask (api)		Mask (with N or X) positions of input alignment
mutate (api)		Adds substitutions (~sequencing errors), or gaps, uniformly in an input alignment
--	gaps	Adds gaps uniformly in an input alignment
--	snvs	Adds substitutions uniformly in an input alignment
orf (api)		Find the longest orf in all given sequences in forward strand
phase (api)		Find best Starts by aligning to translated ref sequences and set them as new start positions
phasent (api)		Find best Starts by aligning to ref sequences and set them as new start positions
random (api)		Generate random sequences
reformat (api)		Reformats input alignment into phylip of fasta format
--	clustal	Reformats an input alignment into Clustal
--	fasta	Reformats an input alignment into Fasta
--	nexus	Reformats an input alignment into nexus
--	paml	Reformats an input alignment into PAML input format
--	phylip	Reformats an input alignment into Phylip
--	tnt	Reformats an input alignment into TNT input file
rename (api)		Rename sequences of the input alignment (using a map file, with a regexp, or just clean names)
replace (api)		Replace characters in sequences of input alignment
revcomp (api)		Reverse complements an input alignment
sample (api)		Samples sequences or sites from an input alignment
--	seqs	Samples a subset of sequences from the input alignment
--	sites	Takes a random subalignment
--	rarefy	Takes a sample taking into accounts weights
shuffle (api)		A set of commands to shuffle an alignment
--	recomb	Recombines sequences in the input alignment (copy/paste)
--	rogue	Simulates rogue taxa
--	seqs	Shuffles sequence order in alignment
--	sites	Shuffles n alignment sites vertically
--	swap	Swaps portion of sequences in the input alignment (cut/paste)
split (api)		Split an input alignment according to partitions defined in an partition file
sort (api)		Sorts the alignment by sequence name
stats (api)		Prints different characteristics of the alignment
--	alleles	Prints the average number of alleles per sites of the alignment
--	alphabet	Prints the alphabet detected for the alignment
--	char	Prints frequence of different characters (aa/nt) of the alignment
--	gaps	Prints statistics about gaps for each sequence of the alignment
--	length	Prints the length of sequences in the alignment
--	mutations	Prints, for each sequence, the number of mutations compared to a reference sequence
--	maxchar	Prints max occurence char for each alignment site
--	nalign	Prints the number of alignments in the input file (phylip)
--	nseq	Prints the number of sequences in the alignment
--	taxa	Prints index (position) and name of taxa of the alignment file
subseq (api)		Take a sub-alignment from the input alignment
subset (api)		Take a subset of sequences from the input alignment
subsites (api)		Take a subset of the sites from the input alignment
sw (api)		Aligns 2 sequences using Smith&Waterman algorithm
tolower (api)		Replace upper case characters by lower case characters
toupper (api)		Replace lower case characters by upper case characters
translate (api)		Translates an input sequence into Amino-Acids
transpose (api)		Transposes an input alignment (sequences<=>sites)
trim (api)		This command trims names of sequences or sequences themselves
--	name	Trims names of sequences
--	seq	Trims sequences of the input alignment
unalign (api)		Unaligns input alignment
version		Prints the current version of goalign

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Goalign: toolkit and api for alignment manipulation

Github repository

Introduction

Installation

Binaries

From sources

Commands

Files

index.md

Latest commit

History

index.md

File metadata and controls

Goalign: toolkit and api for alignment manipulation

Github repository

Introduction

Installation

Binaries

From sources

Commands