Skip to content

Latest commit

 

History

History
137 lines (126 loc) · 13.4 KB

index.md

File metadata and controls

137 lines (126 loc) · 13.4 KB

Goalign: toolkit and api for alignment manipulation

Github repository

Goalign github repository.

Introduction

Goalign is a set of command line tools to manipulate multiple alignments. It is implemented in Go language.

The goal is to handle multiple alignments in different input and output formats (Fasta, Phylip, Clustal and Nexus) through several basic commands. Each command may print result (usually an alignment) in the standard output, and thus can be piped to the standard input of the next Goalign command.

Installation

Binaries

You can download already compiled binaries for the latest release in the release section. Binaries are available for MacOS, Linux, and Windows (32 and 64 bits).

Once downloaded, you can just run the executable without any other downloads.

From sources

In order to compile goalign, you must first download and install Go on your system.

Then you just have to type :

go get github.com/evolbioinfo/goalign/

This will download Goalign sources from github, and all its dependencies.

You can then build it with:

cd $GOPATH/src/github.com/evolbioinfo/goalign/
make

The goalign executable should be located in the $GOPATH/bin folder.

Commands

Here is the list of all commands, with the link to the full description, and a link to a snippet that does it in GO. Almost all commands can have the following arguments:

  • -p: input is in phylip format (default fasta). Output format will also be phylip in this case;
  • -x: input is in nexus format (default fasta), lower priority than -p. Output format will also be nexus in this case;
  • -u: input is in clustan format (default fasta), lower priority than -p and -x. Output format will also be clustal in this case;
  • --input-strict: if -p is also given, then input is considered phylip strict, i.e:
    • sequence names are maximum 10 character long. goalign removes spaces in sequence names;
    • sequence starts at position 11 (just after sequence name).
  • --output-strict: if -p is also given, then output alignments are written in strict phylip format, i.e:
    • sequence names are maximum 10 character long, otherwise they are truncated;
  • --no-block: if -pis also given, then output alignments are written in phylip, without 10 character block separation.
  • --one-line: if -pis also given, then output alignments are written inphylip, on one single line.
  • --auto-detect (overrides -p, -u and -x): It will test input formats in the following order:
    1. Fasta
    2. Nexus
    3. Clustal
    4. Phylip If none of these formats is recognized, then will exit with an error. Please also note that in --auto-detect mode, phylip format is considered as not strict.
  • --alphabet: Used to specify which alphabet must be used to parse the alignment. It can be auto (default), aa, or nt. By default, the alphabet is deduced from the content of the input file. In the case of nexus format, when --alphabet auto is specified, the alphabet specified in the nexus file is used. Otherwise, this option overrides the nexus file alphabet.

Output is written to stdout by default, but can be generally written to files with the -o option. If the given output file has a .gz or .xz extension, the output is compressed accordingly.

Command Subcommand Description
addid (api) Adds a string to each sequence identifier of the input alignment
append (api) Concatenates several alignments by adding new alignments as new sequences of the first alignment
build (api) Command to build output files : bootstrap for example
-- distboot Builds bootstrap distances matrices from input alignment (nt only)
-- seqboot Builds bootstrap alignments from input alignment
clean (api) Removes gap sites/sequences
-- sites Removes sequences with gaps
-- seqs Removes sites with gaps
codonalign (api) Adds gaps in nt sequences, according to its corresponding protein alignment
compress (api) Removes identical patterns/sites from an input alignment
compute (api) Different computations (distances, entropy, etc.)
-- distance Computes distance matrix from inpu alignment
-- entropy Computes entropy of sites of a given alignment
-- pssm Computes and prints a Position specific scoring matrix
concat (api) Concatenates a set of alignment
consensus (api) Computes a basic majority consensus sequence
extract Extracts sub-sequences from an input alignment
completion Generates auto-completion commands for bash or zsh
dedup (api) Deduplicate/Remove identical sequences
diff (api) Compares all sequences of an alignment to the first one, and counts differences
divide (api) Divide an input alignment in several output files
draw (api) Draws an input alignment
-- biojs Displays an input alignment in an html file using biojs
-- png Displays an input alignment in a png file
identical (api) Tells whether two alignments are identical
mask (api) Mask (with N or X) positions of input alignment
mutate (api) Adds substitutions (~sequencing errors), or gaps, uniformly in an input alignment
-- gaps Adds gaps uniformly in an input alignment
-- snvs Adds substitutions uniformly in an input alignment
orf (api) Find the longest orf in all given sequences in forward strand
phase (api) Find best Starts by aligning to translated ref sequences and set them as new start positions
phasent (api) Find best Starts by aligning to ref sequences and set them as new start positions
random (api) Generate random sequences
reformat (api) Reformats input alignment into phylip of fasta format
-- clustal Reformats an input alignment into Clustal
-- fasta Reformats an input alignment into Fasta
-- nexus Reformats an input alignment into nexus
-- paml Reformats an input alignment into PAML input format
-- phylip Reformats an input alignment into Phylip
-- tnt Reformats an input alignment into TNT input file
rename (api) Rename sequences of the input alignment (using a map file, with a regexp, or just clean names)
replace (api) Replace characters in sequences of input alignment
revcomp (api) Reverse complements an input alignment
sample (api) Samples sequences or sites from an input alignment
-- seqs Samples a subset of sequences from the input alignment
-- sites Takes a random subalignment
-- rarefy Takes a sample taking into accounts weights
shuffle (api) A set of commands to shuffle an alignment
-- recomb Recombines sequences in the input alignment (copy/paste)
-- rogue Simulates rogue taxa
-- seqs Shuffles sequence order in alignment
-- sites Shuffles n alignment sites vertically
-- swap Swaps portion of sequences in the input alignment (cut/paste)
split (api) Split an input alignment according to partitions defined in an partition file
sort (api) Sorts the alignment by sequence name
stats (api) Prints different characteristics of the alignment
-- alleles Prints the average number of alleles per sites of the alignment
-- alphabet Prints the alphabet detected for the alignment
-- char Prints frequence of different characters (aa/nt) of the alignment
-- gaps Prints statistics about gaps for each sequence of the alignment
-- length Prints the length of sequences in the alignment
-- mutations Prints, for each sequence, the number of mutations compared to a reference sequence
-- maxchar Prints max occurence char for each alignment site
-- nalign Prints the number of alignments in the input file (phylip)
-- nseq Prints the number of sequences in the alignment
-- taxa Prints index (position) and name of taxa of the alignment file
subseq (api) Take a sub-alignment from the input alignment
subset (api) Take a subset of sequences from the input alignment
subsites (api) Take a subset of the sites from the input alignment
sw (api) Aligns 2 sequences using Smith&Waterman algorithm
tolower (api) Replace upper case characters by lower case characters
toupper (api) Replace lower case characters by upper case characters
translate (api) Translates an input sequence into Amino-Acids
transpose (api) Transposes an input alignment (sequences<=>sites)
trim (api) This command trims names of sequences or sequences themselves
-- name Trims names of sequences
-- seq Trims sequences of the input alignment
unalign (api) Unaligns input alignment
version Prints the current version of goalign