Plasmid Database

Please, follow those steps to download a reliable and complete plasmid database. This is going to take several hours but needs to be done only once.

1. Download plasmid database info file:

ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/plasmids.txt

2. Extract sequences from all accession numbers into a FASTA file using eutils:

This command outputs a raw FASTA with about 12000 sequences

for i in $(cat plasmids.txt | awk 'BEGIN{FS="\t"} (NR>2) {if ($6 ~ "N") {print $6;} else {print $7}}'); do curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&amp;id=$i&amp;retmode=text&amp;rettype=fasta"; done > plasmids.fna

3. Remove concepts

From PlasmidID folder execute:

filter_fasta.sh -i PATH/TO/FILE/plasmids.fna -N -l gene -l partial -l putative -l protein -l hypothetical -l unnamed -o PATH/TO/FILE -n plasmids

A file named plasmids_term.fasta will be created with -o argument for the output directory and -n for file name.

4. Remove redundancy

From PlasmidID folder execute:

cdhit_cluster.sh -i PATH/TO/FILE/plasmids_term.fasta -p -c 100 -M 20000 -T 8

NOTE:

-i argument is the route to and plasmids.fna file
The output will be the same as the input
Memmory (-M) and number of threads (-T) can vary depending on the computer than execute this command

NOTE2:

This step is optional, PlasmidID works with any DNA database. Redundancy removal is useful in order to reduce execution time. Also, any other clustering software is welcome.

Home
Installation and Dependencies
Plasmid Database
Execution
Understanding the image: track by track
Summary table and summary image
How to chose the right plasmids

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plasmid Database

1. Download plasmid database info file:

2. Extract sequences from all accession numbers into a FASTA file using eutils:

3. Remove concepts

4. Remove redundancy

Clone this wiki locally