Installation

 __  __ __  __ ______ _______ _____ _____     _____ _      ______          _   _ _    _ _____
 |  \/  |  \/  |  ____|__   __/ ____|  __ \   / ____| |    |  ____|   /\   | \ | | |  | |  __ \
 | \  / | \  / | |__     | | | (___ | |__) | | |    | |    | |__     /  \  |  \| | |  | | |__) |
 | |\/| | |\/| |  __|    | |  \___ \|  ___/  | |    | |    |  __|   / /\ \ | . ` | |  | |  ___/
 | |  | | |  | | |____   | |  ____) | |      | |____| |____| |____ / ____ \| |\  | |__| | |
 |_|  |_|_|  |_|______|  |_| |_____/|_|       \_____|______|______/_/    \_|_| \_|\____/|_|

The algorithm identifies contigs that are putatively cross-contaminated by finding pairs of identical or nearly identical sequences. It then uses coverage information to distinguish between cross-contaminated and clean sequences. The algorithm therefore assumes that contaminating DNA and the resulting reads are always present in a smaller amount than correct DNA and reads.

Installation

The deployed and prepared environment for running MMETSP cleanup iside the virtual machine.

Requirements

150 GB of free space
20+ Gb of RAM

Preparation (examples are for the Ubuntu)

install virtualbox:

$ sudo apt-get install virtualbox

install vagrant:

$ sudo apt-get install vagrant

install vagrant disksize plugin:

$ vagrant plugin install vagrant-disksize

Create an empty folder and download the preparation shell script into it:

$ wget http://kolisko-lab.bc.cas.cz/mmetsp_cleanup/download_box.sh

Make the script executable and run it

$ chmod +x download_box.sh
$ ./download_box.sh

This script downloads all necessary files for the virtual machine (~25 Gb).

add the box to the vagrant environment

$ vagrant box add mmetsp_cleanup mmetsp_cleanup.box

start the virtual machine

$ vagrant up

How to use

When virtual machine is ready you can connect to it using ssh:

$ vagrant ssh

Main configuration files for editing:

settings.yml

/home/vagrant/mmetsp_data/settings.yml - coverage_ratio thresholds are defined there

winston.hits_filtering.len_ratio — minimal qcovhsp for hits filtering
winston.hits_filtering.len_minimum — minimal hit lenth for hits filtering
winston.coverage_ratio.REGULAR — Reads coverage ratio for REGULAR dataset pair type (minimal difference between coverage of LEFT_ORG and RIGHT_ORG contig to consider it a contaminated, lower values make contamination prediction more strict, less contaminations will be found)
winston.coverage_ratio.CLOSE — Reads coverage ratio for CLOSE dataset pair type
winston.coverage_ratio.LEFT_EATS_RIGHT — Reads coverage ratio for CLOSE dataset pair type
winston.coverage_ratio.RIGHT_EATS_LEFT — Reads coverage ratio for CLOSE dataset pair type

types.csv

/home/vagrant/mmetsp_data/types.csv - file with types and thresholds for datasets. It contains all possible combinations of dataset pairs.

The structure of file:

LEFT_ORG_ID,RIGHT_ORG_ID,THRESHOLD,TYPE

THRESHOLD - (float) minimal percentage of identity of BLAST hit to consider it a suspicious.
TYPE - (float) type, describled in settings.yml
- TYPES:
  - REGULAR - two unrelated organisms
  - CLOSE - Evolutionary close species with more stringent setting for contamination identification to reduce false positves
  - LEFT_EATS_RIGHT and RIGHT_EATS_LEFT - For situations where one sequenced organisms is also present in other cultures as a food source

In the WM home folder, you will find a script run.sh, which starts the mmetsp cleanup pipeline. To run the process simply run that script:

$ ./run.sh

The results will appear in the folder: /home/vagrant/mmetsp_data/results/

Tips

To exit the virtual machine terminal session type

$ exit

To stop running virtual machine from your local computer type the command

$ vagrant halt

To view the status of VM type

$ vagrant status

You can share files between VM and local computer by putting them to the folder with Vagrantfile. They will appear in the VM in /vagrant folder. It can be useful if you don't want to edit types.csv and settings.csv from the VM.

Contact

Email me and I can solve your problems and answer your questions: [email protected]

🚧 Preparation pipeline (for internal purposes only)

First of all: check and fix names

check_datasets.rb

Receives a path to the dataset folder with all the .fas files (--datasets_path).

Checks if the file name and MMETSP name of contigs are equal.

As a result builds a wrong_names.csv file with the structure: file_name,name_of_contigs.

Also the scripts assures that all the contigs belong to the same MMETSP sample.

fix_datasets.rb

Receives a path to the wrong_names.csv file (--wrong_names_path) and a path to the datasets folder (--datasets_path).

In each file with a wrong contig name script replaces the MMETSP name of contigs with a name from file name.

fix_one_vs_all.rb

Replaces all the occurrences of "wrong names" in each one vs all .blastab file.

fix_all_vs_all.rb

Replaces all the occurrences of "wrong names" in each all vs all .blastab file.

check_hits.rb

Checks, if files in folder provided contain wrong names.

Archive all the data

tar cvf datasets.tar.gz datasets/*.blastab
tar cvf one_vs_all.tar.gz one_vs_all/*.blastab
tar cvf all_vs_all.tar.gz all_vs_all/*.blastab

Preparation

prepare.rb

Receives the three .tar archives with:

datasets (.fas files)
one vs all hits (.blastab)
all vs all BLAST hits (.blastab)

As an output makes a prepared structure of Decross project.

Make a coverage database

#TODO

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
preparation		preparation
statistics		statistics
test		test
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Vagrantfile		Vagrantfile
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Requirements

Preparation (examples are for the Ubuntu)

How to use

settings.yml

types.csv

Tips

Contact

🚧 Preparation pipeline (for internal purposes only)

First of all: check and fix names

check_datasets.rb

fix_datasets.rb

fix_one_vs_all.rb

fix_all_vs_all.rb

check_hits.rb

Archive all the data

Preparation

prepare.rb

Make a coverage database

About

Releases

Packages

Languages

kolecko007/mmetsp_cleanup

Folders and files

Latest commit

History

Repository files navigation

Installation

Requirements

Preparation (examples are for the Ubuntu)

How to use

settings.yml

types.csv

Tips

Contact

🚧 Preparation pipeline (for internal purposes only)

First of all: check and fix names

check_datasets.rb

fix_datasets.rb

fix_one_vs_all.rb

fix_all_vs_all.rb

check_hits.rb

Archive all the data

Preparation

prepare.rb

Make a coverage database

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages