Skip to content

Commit

Permalink
Merge with remote repo
Browse files Browse the repository at this point in the history
  • Loading branch information
torognes committed Jul 6, 2016
2 parents 6aa9991 + af5d65f commit 3d605ab
Show file tree
Hide file tree
Showing 5 changed files with 132 additions and 139 deletions.
22 changes: 18 additions & 4 deletions CITATION
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
Please cite swarm as follows:

Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2014) Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2:e593 <http://dx.doi.org/10.7717/peerj.593>


- Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2014) Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2:e593 <http://dx.doi.org/10.7717/peerj.593>
- Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2015) Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ 3:e1420 <https://doi.org/10.7717/peerj.1420>

Bibtex format:

@article{10.7717/peerj.593,
Expand All @@ -19,3 +18,18 @@ Bibtex format:
url = {http://dx.doi.org/10.7717/peerj.593},
doi = {10.7717/peerj.593}
}

@article{10.7717/peerj.1420,
title = {Swarm v2: highly-scalable and high-resolution amplicon clustering},
author = {Mahé, Frédéric and Rognes, Torbjørn and Quince, Christopher and de Vargas, Colomban and Dunthorn, Micah},
year = {2015},
month = {12},
keywords = {Environmental diversity, Barcoding, Molecular operational taxonomic units},
abstract = {Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (\textit{d}), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for \textit{d} = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with \textit{d} = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.},
volume = {3},
pages = {e1420},
journal = {PeerJ},
issn = {2167-8359},
url = {https://doi.org/10.7717/peerj.1420},
doi = {10.7717/peerj.1420}
}
114 changes: 42 additions & 72 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,18 @@
A robust and fast clustering method for amplicon-based studies.

The purpose of **swarm** is to provide a novel clustering algorithm
that handles massive sets of amplicons. Traditional clustering
algorithms results are strongly input-order dependent, and rely on an
arbitrary **global** clustering threshold. **swarm** results are
that handles massive sets of amplicons. Results of traditional
clustering algorithms are strongly input-order dependent, and rely on
an arbitrary **global** clustering threshold. **swarm** results are
resilient to input-order changes and rely on a small **local** linking
threshold *d*, the maximum number of differences between two
amplicons. **swarm** forms stable, high-resolution clusters, with a
high yield of biological information.
threshold *d*, representing the maximum number of differences between
two amplicons. **swarm** forms stable, high-resolution clusters, with
a high yield of biological information.

To help users, we describe
[a complete pipeline](https://github.com/frederic-mahe/swarm/wiki/Fred's-metabarcoding-pipeline)
starting from raw fastq files, clustering with **swarm** and producing
a filtered OTU table.

**swarm** 2.0 introduces several novelties and improvements over
swarm 1.0:
Expand All @@ -21,8 +26,8 @@ high yield of biological information.
* a new option called *fastidious* that refines *d* = 1 results and
reduces the number of small OTUs,

Table of Content
================
Table of Contents
=================

* [Common misconceptions](#common_misconceptions)
* [Quick start](#quick_start)
Expand All @@ -41,48 +46,7 @@ Table of Content
* [Contact](#contact)
* [Third-party pipelines](#pipelines)
* [Alternatives](#alternatives)
* [New features](#features)
* [version 2.1.8](#version218)
* [version 2.1.7](#version217)
* [version 2.1.6](#version216)
* [version 2.1.5](#version215)
* [version 2.1.4](#version214)
* [version 2.1.3](#version213)
* [version 2.1.2](#version212)
* [version 2.1.1](#version211)
* [version 2.1.0](#version210)
* [version 2.0.7](#version207)
* [version 2.0.6](#version206)
* [version 2.0.5](#version205)
* [version 2.0.4](#version204)
* [version 2.0.3](#version203)
* [version 2.0.2](#version202)
* [version 2.0.1](#version201)
* [version 2.0.0](#version200)
* [version 1.2.21](#version1221)
* [version 1.2.20](#version1220)
* [version 1.2.19](#version1219)
* [version 1.2.18](#version1218)
* [version 1.2.17](#version1217)
* [version 1.2.16](#version1216)
* [version 1.2.15](#version1215)
* [version 1.2.14](#version1214)
* [version 1.2.13](#version1213)
* [version 1.2.12](#version1212)
* [version 1.2.11](#version1211)
* [version 1.2.10](#version1210)
* [version 1.2.9](#version129)
* [version 1.2.8](#version128)
* [version 1.2.7](#version127)
* [version 1.2.6](#version126)
* [version 1.2.5](#version125)
* [version 1.2.4](#version124)
* [version 1.2.3](#version123)
* [version 1.2.2](#version122)
* [version 1.2.1](#version121)
* [version 1.2.0](#version120)
* [version 1.1.1](#version111)
* [version 1.1.0](#version110)
* [Version history](#history)

<a name="common_misconceptions"/>
## Common misconceptions ##
Expand All @@ -91,9 +55,9 @@ Table of Content
similarities with other clustering methods (e.g.,
[Huse et al, 2010](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909393/)). **swarm**'s
novelty is its iterative growth process and the use of sequence
abundance values to delineate OTUs. Swarm properly delineates large
OTUs (high recall), while being able to distinguish OTUs with as
little as two differences between their centers (high precision).
abundance values to delineate OTUs. **swarm** properly delineates
large OTUs (high recall), and can distinguish OTUs with as little as
two differences between their centers (high precision).

**swarm** uses a local clustering threshold (*d*), not a global
clustering threshold like other algorithms do. Users may be tempted
Expand Down Expand Up @@ -124,7 +88,7 @@ Table of Content
./swarm amplicons.fasta
```

That command will apply default parameters to the fasta file
That command will apply default parameters (`-d 1`) to the fasta file
`amplicons.fasta`. The fasta file must be formatted as follows:

```
Expand All @@ -136,7 +100,7 @@ cgtcgtcgtcgtcgt

where sequence identifiers are unique and end with a value indicating
the number of occurrences of the sequence (e.g., `_1000`). Alternative
formats are possible, please see the
format is possible with the option `-z`, please see the
[user manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf). Swarm
**requires** each fasta entry to present a number of occurrences to
work properly. That crucial information can be produced during the
Expand Down Expand Up @@ -203,7 +167,7 @@ converted to fasta.
<a name="linearization"/>
### Linearization ###

Swarm accepts wrapped fasta files as well as linear fasta
**swarm** accepts wrapped fasta files as well as linear fasta
files. However, linear fasta files where amplicons are written on two
lines (one line for the fasta header, one line for the sequence) are
easier to manipulate. For instance, many post-clustering queries can
Expand Down Expand Up @@ -280,9 +244,9 @@ you still want to run swarm, you can easily add fake abundance values:
sed '/^>/ s/$/_1/' amplicons.fasta > amplicons_with_abundances.fasta
```

Alternatively, you may specify a default abundance value with the
`--append-abundance` (`-a`) option to be used when abundance information
is missing from a sequence.
Alternatively, you may specify a default abundance value with
**swarm**'s `--append-abundance` (`-a`) option to be used when
abundance information is missing from a sequence.

<a name="launch"/>
### Launch swarm ###
Expand All @@ -293,11 +257,11 @@ Here is a typical way to use **swarm**:
./swarm -f -t 4 -w OTU_representatives.fasta amplicons.fasta > /dev/null
```

Swarm will partition your dataset with the finest resolution (local
number of differences *d* = 1 by default, built-in elimination of
potential chained OTUs, fastidious processing) using 4 CPU-cores. OTU
representatives will be written to a new fasta file, other results
will be discarded (`/dev/null`).
**swarm** will partition your dataset with the finest resolution
(local number of differences *d* = 1 by default, built-in elimination
of potential chained OTUs, fastidious processing) using 4
CPU-cores. OTU representatives will be written to a new fasta file,
other results will be discarded (`/dev/null`).

See the
[user manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf)
Expand All @@ -317,12 +281,12 @@ that the amplicon fasta file was prepared as describe above
### Refine swarm OTUs ###

The chain-breaking, which used to be performed in a second step in
swarm 1.0, is now built-in and performed by default. It is possible to
deactivate it with the `--no-otu-breaking` option, but it is not
recommended. The fastidious option is recommended when using *d* = 1,
as it will reduce the number of small OTUs while maintaining a high
clustering resolution. The principle of the fastidious option is
described in the figure below:
**swarm** 1.0, is now built-in and performed by default. It is
possible to deactivate it with the `--no-otu-breaking` option, but it
is not recommended. The fastidious option is recommended when using
*d* = 1, as it will reduce the number of small OTUs while maintaining
a high clustering resolution. The principle of the fastidious option
is described in the figure below:


![](https://github.com/frederic-mahe/swarm/blob/master/figures/swarm_2.0_fastidious_reduced.png)
Expand Down Expand Up @@ -433,8 +397,14 @@ methods, here are some links:
* [Sumaclust](http://metabarcoding.org/sumatra)
* [Crunchclust](https://code.google.com/p/crunchclust/)

<a name="features"/>
## New features##

<a name="history"/>
## Version history##

<a name="version219"/>
### version 2.1.9 ###

**swarm** 2.1.9 fixes a problem when compiling with GCC version 6.

<a name="version218"/>
### version 2.1.8 ###
Expand Down
Loading

0 comments on commit 3d605ab

Please sign in to comment.