Lederhosen

Cluster raw Illumina 16S rRNA amplicon data to generate OTUs.

Who can use Lederhosen?

Lederhosen is free and open source under the MIT open source license

How do I get Lederhosen?

Obtain & Install UCLUST
Obtain & Install BLAT
Get a copy of TaxCollector
Install Lederhosen by typing:

sudo gem install lederhosen
Check installation by typing lederhosen. You should see some help text.

Features

Sequence trimming (paired-end Illumina).
K-mer filtering.
Clustering w/ UCLUST.
UCLUST output filtering.
Separation of representative reads.
Separation of all reads belonging to each cluster.
Identification of clusters using TaxCollector.
Generation of OTU abundancy matrices.

How do I use Lederhosen?

Lederhosen is just a convenient wrapper for UCLUST and BLAT with some scripts for quality filtering, de-noising of data as well as creation of nice tables. It is similar to QIIME but meant for paired-end Illumina data rather than single-end 454. The basic lederhosen pipeline consists of: trimming, joining, sorting, filtering, clustering, more filtering, and output generation (OTU tables, representative reads, reads by cluster, and taxonomic descriptions for clusters). See the example pipeline in pipeline.sh.

Tasks

Lederhosen is invoked by typing lederhosen [TASK]

trim

Trim (Illumina) reads using quality scores. Output will be a directory of fasta files. Reads can optionally be gzipped.

lederhosen trim --reads_dir=reads/*.txt --out_dir=trimmed/

join

Join paired reads from all samples end-to-end. This method enables the use of uclust with paired-end data. Output will be a single fasta file.

lederhosen join --trimmed=trimmed/*.fasta --output=joined.fasta

If your reads are not paired, then you do not need to do this step. Instead, concatenate all of the trimmed reads files.

cat trimmed/*.fasta > joined.fasta

sort

Sort reads by length. This is a requirement for uclust's single-linkage clustering algorithim.

lederhosen sort --input=joined.fasta --output=sorted.fasta

k_filter

K-mer abundance noise filtering. This step is experimental and optional. It may reduce the time it takes to perform the clustering.

lederhosen k_filter --input=joined.fasta --output=filtered.fasta --k=10 --cutoff=50

cluster

Cluster reads using UCLUST. Output is a uc file.

lederhosen cluster --input=sorted.fasta --identity=0.80 --output=clusters.uc

uc_filter

Filter UC file removing singleton clusters or clusters that are only present in a few samples. This greatly reduces the noise of the data without removing many of the reads.

lederhosen uc_filter --input=clusters.uc --output=clusters.uc.filtered --reads=50 --samples=10

otu_table

Create an OTU abundance table where rows are samples and columns are clusters. The entries are the number of reads for that cluster in a sample.

lederhosen otu_table --clusters=clusters.uc --output=otu_prefix.csv

rep_reads

Get representative reads for each cluster. Output is a single fasta file.

lederhosen rep_reads --clusters=clusters.uc --joined=joined.fasta --output=representative_reads.fasta

split

Get all reads belonging to each cluster. Output is a directory containing a fasta file for each cluster. The fasta file contains the joined reads.

lederhosen split --clusters=clusters.uc --reads=joined.fasta --min-clst-size=100

name

Identify clusters in a database using the representative reads. This is a simple wrapper for BLAT. The output is a tab-delimited file similar to a BLAST output file. For this step you need to have BLAT installed and also a TaxCollector database.

lederhosen name --reps=representative_reads.fasta --database taxcollector.fa --output blast_like_output.txt

add_names

Add phylogenetic classification of clusters to OTU abundance file.

lederhosen add_names --blat=blat_output.txt --level=taxonomic_level --table=otu_file.csv --output=named_out_file.csv

Where taxonomic_level can be: kingdom, domain, phylum, class, order, family, genus or species. This method only works with a TaxCollector database.

squish

Squish an OTU abundance file by column name (phylogenetic description)

lederhosen squish --csv-file=named_out_file.csv --output=squished_named_out_file.csv

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
bin		bin
examples		examples
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.rvmrc		.rvmrc
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
Rakefile		Rakefile
lederhosen.gemspec		lederhosen.gemspec
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lederhosen

Who can use Lederhosen?

How do I get Lederhosen?

Features

How do I use Lederhosen?

Tasks

trim

join

sort

k_filter

cluster

uc_filter

otu_table

rep_reads

split

name

add_names

squish

About

Releases

Packages

License

rgiannico/lederhosen

Folders and files

Latest commit

History

Repository files navigation

Lederhosen

Who can use Lederhosen?

How do I get Lederhosen?

Features

How do I use Lederhosen?

Tasks

trim

join

sort

k_filter

cluster

uc_filter

otu_table

rep_reads

split

name

add_names

squish

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages