-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
39 changed files
with
5,168 additions
and
4,006 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
language: c++ | ||
|
||
os: linux | ||
|
||
dist: bionic | ||
|
||
compiler: gcc | ||
|
||
before_install: | ||
- sudo apt-get install -y valgrind | ||
|
||
script: | ||
- make | ||
- export PATH=$PWD/bin:$PATH | ||
- git clone https://github.com/frederic-mahe/swarm-tests.git && cd swarm-tests && bash ./run_all_tests.sh | tee tests.log && ! grep -q FAIL tests.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
[![Build Status](https://travis-ci.org/torognes/swarm.svg?branch=swarm3)](https://travis-ci.org/torognes/swarm) | ||
|
||
# swarm | ||
|
||
A robust and fast clustering method for amplicon-based studies. | ||
|
@@ -16,21 +18,32 @@ To help users, we describe | |
starting from raw fastq files, clustering with **swarm** and producing | ||
a filtered OTU table. | ||
|
||
swarm 2.0 introduces several novelties and improvements over swarm | ||
swarm 3.0 introduces: | ||
* a much faster default algorithm, | ||
* a reduced memory footprint, | ||
* binaries for Windows x86-64, GNU/Linux ARM 64, and GNU/Linux POWER8, | ||
* an updated, hardened, and thoroughly tested code. | ||
|
||
Please note that: | ||
* strict dereplication of input sequences is now mandatory, | ||
* \-\-seeds option (\-w) now outputs results sorted by decreasing | ||
abundance, and then by alphabetical order of sequence labels. | ||
|
||
swarm 2.0 introduced several novelties and improvements over swarm | ||
1.0: | ||
* built-in breaking phase now performed automatically, | ||
* possibility to output OTU representatives in fasta format (option | ||
`-w`), | ||
* fast algorithm now used by default for *d* = 1 (linear time | ||
complexity), | ||
* a new option called *fastidious* that refines *d* = 1 results and | ||
reduces the number of small OTUs, | ||
reduces the number of small OTUs. | ||
|
||
## Common misconceptions | ||
|
||
**swarm** is a single-linkage clustering method, with some superficial | ||
similarities with other clustering methods (e.g., | ||
[Huse et al, 2010](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909393/)). **swarm**'s | ||
similarities with other clustering methods (e.g., [Huse et al, | ||
2010](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909393/)). **swarm**'s | ||
novelty is its iterative growth process and the use of sequence | ||
abundance values to delineate OTUs. **swarm** properly delineates | ||
large OTUs (high recall), and can distinguish OTUs with as little as | ||
|
@@ -76,8 +89,8 @@ cgtcgtcgtcgtcgt | |
|
||
where sequence identifiers are unique and end with a value indicating | ||
the number of occurrences of the sequence (e.g., `_1000`). Alternative | ||
format is possible with the option `-z`, please see the | ||
[user manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf). Swarm | ||
format is possible with the option `-z`, please see the [user | ||
manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf). Swarm | ||
**requires** each fasta entry to present a number of occurrences to | ||
work properly. That crucial information can be produced during the | ||
[dereplication](#dereplication-mandatory) step. | ||
|
@@ -87,7 +100,7 @@ Use `swarm -h` to get a short help, or see the | |
for a complete description of input/output formats and command line | ||
options. | ||
|
||
The memory footprint of **swarm** is roughly 1.6 times the size of the | ||
The memory footprint of **swarm** is roughly 0.6 times the size of the | ||
input fasta file. When using the fastidious option, memory footprint | ||
can increase significantly. See options `-c` and `-y` to control and | ||
cap swarm's memory consumption. | ||
|
@@ -210,15 +223,10 @@ from two different sets have the same hash code, it means that the | |
sequences they represent are identical. | ||
|
||
If for some reason your fasta entries don't have abundance values, and | ||
you still want to run swarm, you can easily add fake abundance values: | ||
|
||
```sh | ||
sed '/^>/ s/$/_1/' amplicons.fasta > amplicons_with_abundances.fasta | ||
``` | ||
|
||
Alternatively, you may specify a default abundance value with | ||
**swarm**'s `--append-abundance` (`-a`) option to be used when | ||
abundance information is missing from a sequence. | ||
you still want to run swarm (not recommended), you can specify a | ||
default abundance value with **swarm**'s `--append-abundance` (`-a`) | ||
option to be used when abundance information is missing from a | ||
sequence. | ||
|
||
|
||
### Launch swarm ### | ||
|
@@ -305,15 +313,6 @@ rm "${AMPLICONS}" | |
``` | ||
|
||
|
||
## Troubleshooting ## | ||
|
||
If **swarm** exits with an error message saying `This program | ||
requires a processor with SSE2`, your computer is too old to run | ||
**swarm** (or based on a non x86-64 architecture). **swarm** only runs | ||
on CPUs with the SSE2 instructions, i.e. most Intel and AMD CPUs | ||
released since 2004. | ||
|
||
|
||
## Citation ## | ||
|
||
To cite **swarm**, please refer to: | ||
|
@@ -333,7 +332,7 @@ You are welcome to: | |
|
||
* submit suggestions and bug-reports at: https://github.com/torognes/swarm/issues | ||
* send a pull request on: https://github.com/torognes/swarm/ | ||
* compose a friendly e-mail to: Frédéric Mahé <mahe@rhrk.uni-kl.de> and Torbjørn Rognes <[email protected]> | ||
* compose a friendly e-mail to: Frédéric Mahé <frederic.mahe@cirad.fr> and Torbjørn Rognes <[email protected]> | ||
|
||
|
||
## Third-party pipelines ## | ||
|
@@ -356,7 +355,7 @@ You are welcome to: | |
If you want to try alternative free and open-source clustering | ||
methods, here are some links: | ||
|
||
* [VSEARCH](https://github.com/torognes/vsearch) | ||
* [vsearch](https://github.com/torognes/vsearch) | ||
* [Oligotyping](http://merenlab.org/projects/oligotyping/) | ||
* [DNAclust](http://dnaclust.sourceforge.net/) | ||
* [Sumaclust](http://metabarcoding.org/sumatra) | ||
|
@@ -365,6 +364,11 @@ methods, here are some links: | |
|
||
## Version history ## | ||
|
||
### version 3.0 ### | ||
|
||
**swarm** 3.0 is much faster when _d_ = 1, and consumes less memory. | ||
Strict dereplication is now mandatory. | ||
|
||
### version 2.2.2 ### | ||
|
||
**swarm** 2.2.2 fixes a bug causing Swarm to wait forever in very rare | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,13 @@ | ||
#!/usr/bin/env python | ||
#!/usr/bin/env python3 | ||
# -*- coding: utf-8 -*- | ||
""" | ||
Read all fasta files and build a sorted amplicon contingency | ||
table. Usage: python amplicon_contingency_table.py samples_*.fas | ||
table. Usage: python3 amplicon_contingency_table.py samples_*.fas | ||
""" | ||
|
||
from __future__ import print_function | ||
|
||
__author__ = "Frédéric Mahé <[email protected]>" | ||
__date__ = "2016/03/12" | ||
__version__ = "$Revision: 2.1" | ||
__author__ = "Frédéric Mahé <[email protected]>" | ||
__date__ = "2019/09/24" | ||
__version__ = "$Revision: 3.0" | ||
|
||
import os | ||
import sys | ||
|
@@ -35,7 +33,7 @@ def fasta_parse(): | |
sample = os.path.basename(fasta_file) | ||
sample = os.path.splitext(sample)[0] | ||
samples[sample] = samples.get(sample, 0) + 1 | ||
with open(fasta_file, "rU") as fasta_file: | ||
with open(fasta_file, "r") as fasta_file: | ||
for line in fasta_file: | ||
if line.startswith(">"): | ||
amplicon, abundance = line.strip(">;\n").split(separator) | ||
|
@@ -65,7 +63,7 @@ def main(): | |
all_amplicons, amplicons2samples, samples = fasta_parse() | ||
|
||
# Sort amplicons by decreasing abundance (and by amplicon name) | ||
sorted_all_amplicons = sorted(all_amplicons.iteritems(), | ||
sorted_all_amplicons = sorted(iter(all_amplicons.items()), | ||
key=operator.itemgetter(1, 0)) | ||
sorted_all_amplicons.reverse() | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,24 @@ | ||
#!/usr/bin/env python | ||
#!/usr/bin/env python3 | ||
# -*- coding: utf-8 -*- | ||
""" | ||
Visualize the internal structure of a swarm (color vertices by | ||
abundance). Requires the module igraph and python 2.7+. | ||
Limitations: amplicons grafted with the fastidious option will be | ||
discarded and will not be visualized. | ||
abundance). Requires the module igraph and python 3. | ||
""" | ||
|
||
from __future__ import print_function | ||
|
||
__author__ = "Frédéric Mahé <[email protected]>" | ||
__date__ = "2016/11/09" | ||
__version__ = "$Revision: 3.1" | ||
__author__ = "Frédéric Mahé <[email protected]>" | ||
__date__ = "2019/09/24" | ||
__version__ = "$Revision: 4.0" | ||
|
||
import sys | ||
import os.path | ||
from igraph import Graph, plot | ||
from optparse import OptionParser | ||
|
||
#*****************************************************************************# | ||
# *************************************************************************** # | ||
# # | ||
# Functions # | ||
# # | ||
#*****************************************************************************# | ||
# *************************************************************************** # | ||
|
||
|
||
def option_parse(): | ||
|
@@ -76,7 +71,7 @@ def parse_files(swarms, internal_structure, OTU, drop): | |
""" | ||
# List amplicon ids and abundances | ||
amplicons = list() | ||
with open(swarms, "rU") as swarms: | ||
with open(swarms, "r") as swarms: | ||
for i, swarm in enumerate(swarms): | ||
if i == OTU - 1: | ||
# Deal with ";size=" in a rather clumsy way... but it works | ||
|
@@ -100,7 +95,7 @@ def parse_files(swarms, internal_structure, OTU, drop): | |
|
||
# List pairwise relations | ||
relations = list() | ||
with open(internal_structure, "rU") as internal_structure: | ||
with open(internal_structure, "r") as internal_structure: | ||
print("Parsing amplicon relationships", file=sys.stdout) | ||
for line in internal_structure: | ||
# Get the first four elements of the line | ||
|
@@ -138,7 +133,7 @@ def build_graph(amplicons, relations): | |
|
||
amplicon_ids = [amplicon[0] for amplicon in amplicons] | ||
abundances = [int(amplicon[1]) for amplicon in amplicons] | ||
minimum, maximum = min(abundances), max(abundances) | ||
maximum = max(abundances) | ||
|
||
# Determine canvas size | ||
if len(abundances) < 500: | ||
|
@@ -214,11 +209,11 @@ def main(): | |
return | ||
|
||
|
||
#*****************************************************************************# | ||
# *************************************************************************** # | ||
# # | ||
# Body # | ||
# # | ||
#*****************************************************************************# | ||
# *************************************************************************** # | ||
|
||
if __name__ == '__main__': | ||
|
||
|
Oops, something went wrong.