Merge with remote repo

torognes · Jul 6, 2016 · 3d605ab · 3d605ab
2 parents 6aa9991 + af5d65f
commit 3d605ab
Show file tree

Hide file tree

Showing 5 changed files with 132 additions and 139 deletions.
diff --git a/CITATION b/CITATION
@@ -1,8 +1,7 @@
 Please cite swarm as follows:
-
-Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2014) Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2:e593 <http://dx.doi.org/10.7717/peerj.593>
-
-
+- Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2014) Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2:e593 <http://dx.doi.org/10.7717/peerj.593>
+- Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2015) Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ 3:e1420 <https://doi.org/10.7717/peerj.1420>
+
 Bibtex format:
 
 @article{10.7717/peerj.593,
@@ -19,3 +18,18 @@ Bibtex format:
  url = {http://dx.doi.org/10.7717/peerj.593},
  doi = {10.7717/peerj.593}
 }
+
+@article{10.7717/peerj.1420,
+ title = {Swarm v2: highly-scalable and high-resolution amplicon clustering},
+ author = {Mahé, Frédéric and Rognes, Torbjørn and Quince, Christopher and de Vargas, Colomban and Dunthorn, Micah},
+ year = {2015},
+ month = {12},
+ keywords = {Environmental diversity, Barcoding, Molecular operational taxonomic units},
+ abstract = {Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (\textit{d}), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for \textit{d} = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with \textit{d} = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.},
+ volume = {3},
+ pages = {e1420},
+ journal = {PeerJ},
+ issn = {2167-8359},
+ url = {https://doi.org/10.7717/peerj.1420},
+ doi = {10.7717/peerj.1420}
+}
diff --git a/README.md b/README.md
@@ -3,13 +3,18 @@
 A robust and fast clustering method for amplicon-based studies.
 
 The purpose of **swarm** is to provide a novel clustering algorithm
-that handles massive sets of amplicons. Traditional clustering
-algorithms results are strongly input-order dependent, and rely on an
-arbitrary **global** clustering threshold. **swarm** results are
+that handles massive sets of amplicons. Results of traditional
+clustering algorithms are strongly input-order dependent, and rely on
+an arbitrary **global** clustering threshold. **swarm** results are
 resilient to input-order changes and rely on a small **local** linking
-threshold *d*, the maximum number of differences between two
-amplicons. **swarm** forms stable, high-resolution clusters, with a
-high yield of biological information.
+threshold *d*, representing the maximum number of differences between
+two amplicons. **swarm** forms stable, high-resolution clusters, with
+a high yield of biological information.
+
+To help users, we describe
+[a complete pipeline](https://github.com/frederic-mahe/swarm/wiki/Fred's-metabarcoding-pipeline)
+starting from raw fastq files, clustering with **swarm** and producing
+a filtered OTU table.
 
 **swarm** 2.0 introduces several novelties and improvements over
   swarm 1.0:
@@ -21,8 +26,8 @@ high yield of biological information.
 * a new option called *fastidious* that refines *d* = 1 results and
   reduces the number of small OTUs,
 
-Table of Content
-================
+Table of Contents
+=================
 
 * [Common misconceptions](#common_misconceptions)
 * [Quick start](#quick_start)
@@ -41,48 +46,7 @@ Table of Content
 * [Contact](#contact)
 * [Third-party pipelines](#pipelines)
 * [Alternatives](#alternatives)
-* [New features](#features)
-  * [version 2.1.8](#version218)
-  * [version 2.1.7](#version217)
-  * [version 2.1.6](#version216)
-  * [version 2.1.5](#version215)
-  * [version 2.1.4](#version214)
-  * [version 2.1.3](#version213)
-  * [version 2.1.2](#version212)
-  * [version 2.1.1](#version211)
-  * [version 2.1.0](#version210)
-  * [version 2.0.7](#version207)
-  * [version 2.0.6](#version206)
-  * [version 2.0.5](#version205)
-  * [version 2.0.4](#version204)
-  * [version 2.0.3](#version203)
-  * [version 2.0.2](#version202)
-  * [version 2.0.1](#version201)
-  * [version 2.0.0](#version200)
-  * [version 1.2.21](#version1221)
-  * [version 1.2.20](#version1220)
-  * [version 1.2.19](#version1219)
-  * [version 1.2.18](#version1218)
-  * [version 1.2.17](#version1217)
-  * [version 1.2.16](#version1216)
-  * [version 1.2.15](#version1215)
-  * [version 1.2.14](#version1214)
-  * [version 1.2.13](#version1213)
-  * [version 1.2.12](#version1212)
-  * [version 1.2.11](#version1211)
-  * [version 1.2.10](#version1210)
-  * [version 1.2.9](#version129)
-  * [version 1.2.8](#version128)
-  * [version 1.2.7](#version127)
-  * [version 1.2.6](#version126)
-  * [version 1.2.5](#version125)
-  * [version 1.2.4](#version124)
-  * [version 1.2.3](#version123)
-  * [version 1.2.2](#version122)
-  * [version 1.2.1](#version121)
-  * [version 1.2.0](#version120)
-  * [version 1.1.1](#version111)
-  * [version 1.1.0](#version110)
+* [Version history](#history)
 
 <a name="common_misconceptions"/>
 ## Common misconceptions ##
@@ -91,9 +55,9 @@ Table of Content
   similarities with other clustering methods (e.g.,
   [Huse et al, 2010](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909393/)). **swarm**'s
   novelty is its iterative growth process and the use of sequence
-  abundance values to delineate OTUs. Swarm properly delineates large
-  OTUs (high recall), while being able to distinguish OTUs with as
-  little as two differences between their centers (high precision).
+  abundance values to delineate OTUs. **swarm** properly delineates
+  large OTUs (high recall), and can distinguish OTUs with as little as
+  two differences between their centers (high precision).
 
 **swarm** uses a local clustering threshold (*d*), not a global
   clustering threshold like other algorithms do. Users may be tempted
@@ -124,7 +88,7 @@ Table of Content
 ./swarm amplicons.fasta
 ```
 
-That command will apply default parameters to the fasta file
+That command will apply default parameters (`-d 1`) to the fasta file
 `amplicons.fasta`. The fasta file must be formatted as follows:
 
 ```
@@ -136,7 +100,7 @@ cgtcgtcgtcgtcgt
 
 where sequence identifiers are unique and end with a value indicating
 the number of occurrences of the sequence (e.g., `_1000`). Alternative
-formats are possible, please see the
+format is possible with the option `-z`, please see the
 [user manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf). Swarm
 **requires** each fasta entry to present a number of occurrences to
 work properly. That crucial information can be produced during the
@@ -203,7 +167,7 @@ converted to fasta.
 <a name="linearization"/>
 ### Linearization ###
 
-Swarm accepts wrapped fasta files as well as linear fasta
+**swarm** accepts wrapped fasta files as well as linear fasta
 files. However, linear fasta files where amplicons are written on two
 lines (one line for the fasta header, one line for the sequence) are
 easier to manipulate. For instance, many post-clustering queries can
@@ -280,9 +244,9 @@ you still want to run swarm, you can easily add fake abundance values:
 sed '/^>/ s/$/_1/' amplicons.fasta > amplicons_with_abundances.fasta
 ```
 
-Alternatively, you may specify a default abundance value with the
-`--append-abundance` (`-a`) option to be used when abundance information
-is missing from a sequence.
+Alternatively, you may specify a default abundance value with
+**swarm**'s `--append-abundance` (`-a`) option to be used when
+abundance information is missing from a sequence.
 
 <a name="launch"/>
 ### Launch swarm ###
@@ -293,11 +257,11 @@ Here is a typical way to use **swarm**:
 ./swarm -f -t 4 -w OTU_representatives.fasta amplicons.fasta > /dev/null
 ```
 
-Swarm will partition your dataset with the finest resolution (local
-number of differences *d* = 1 by default, built-in elimination of
-potential chained OTUs, fastidious processing) using 4 CPU-cores. OTU
-representatives will be written to a new fasta file, other results
-will be discarded (`/dev/null`).
+**swarm** will partition your dataset with the finest resolution
+(local number of differences *d* = 1 by default, built-in elimination
+of potential chained OTUs, fastidious processing) using 4
+CPU-cores. OTU representatives will be written to a new fasta file,
+other results will be discarded (`/dev/null`).
 
 See the
 [user manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf)
@@ -317,12 +281,12 @@ that the amplicon fasta file was prepared as describe above
 ### Refine swarm OTUs ###
 
 The chain-breaking, which used to be performed in a second step in
-swarm 1.0, is now built-in and performed by default. It is possible to
-deactivate it with the `--no-otu-breaking` option, but it is not
-recommended. The fastidious option is recommended when using *d* = 1,
-as it will reduce the number of small OTUs while maintaining a high
-clustering resolution. The principle of the fastidious option is
-described in the figure below:
+**swarm** 1.0, is now built-in and performed by default. It is
+possible to deactivate it with the `--no-otu-breaking` option, but it
+is not recommended. The fastidious option is recommended when using
+*d* = 1, as it will reduce the number of small OTUs while maintaining
+a high clustering resolution. The principle of the fastidious option
+is described in the figure below:
 
 
 ![](https://github.com/frederic-mahe/swarm/blob/master/figures/swarm_2.0_fastidious_reduced.png)
@@ -433,8 +397,14 @@ methods, here are some links:
 * [Sumaclust](http://metabarcoding.org/sumatra)
 * [Crunchclust](https://code.google.com/p/crunchclust/)
 
-<a name="features"/>
-## New features##
+
+<a name="history"/>
+## Version history##
+
+<a name="version219"/>
+### version 2.1.9 ###
+
+**swarm** 2.1.9 fixes a problem when compiling with GCC version 6.
 
 <a name="version218"/>
 ### version 2.1.8 ###