SPHERE

Synthetic PHasE Rate Estimator by single metagenome sequence

Requirements

Python3
Libraries (automatically installed with pip)
- Numpy
- Scipy
- Pandas
- matplotlib
- PyStan

How to use

Install

git clone [email protected]:TaskeHAMANO/SPHERE.git
cd SPHERE
pip install .

Pre-calculation

Before using sphere, you have to calculate coverage depth. Here is a example by bowtie2 and Samtools.

Building index file for alignment

bowtie2-build -f <Genome_sequence_path> <index_prefix>

Aligning sequence file to the index file

Next, align with metagenome sequence file with constructed index.

# for single end sequence
bowtie2 -x <index_prefix> -U <Metagenome_sequence_path> -S <Samfile_path>

# for paired end sequence
bowtie2 -x <index_prefix> -1 <Metagenome_foward_sequence_path> -2 <Metagenome_reverse_sequence_path> -S <Samfile_path>

Counting coverage depth

Coverage depth of metagenomic sequence is calculated by samtools. In this procedure, please use -aa option to save zero-coverage depth.

samtools sort -O sam -o <Sorted_samfile_path> <Samfile_path>
samtools depth -a -a <Sorted_samfile_path> > <Coverage_depth_path>

(Optional) Select coverage depthes for single genome

If the template sequence file contains multiple sequences, select the depthes with sequence name

grep <Sequence_name> <Coverage_depth_path> > <Selcted_coverage_depth_path>

(Optional) Compression/Noise reduction by rolling median filter

If ...

The length of sequence is too big to estimate as you have only small memory,
The coverage depth is too variable to estimate,

it can be smoothed and compressed by median filter command in sphere. When you setted strided_length=100 and window_length=100, for each 100 bp, the median of coverage depth is sampled. Note that when we used too greater number in window_length and average coverage depth is shallow, it became easy to fail in estimation. Thus I recommend to use smaller value than 100 for both parameter.

sphere_filter -s <strided_length> -w <window_length> <Filtered_coverage_depth_path> <Coverage_depth_path>

Estimation

Sampling mode

This mode performs sampling from the posterior distribution by (quasi-) Markov chain Monte Carlo algorithm. NUTS sampler is used in default.

sphere_estimate -m vonmises -M sampling <Estimated_result_path> [<Coverage_depth_path1> <Coverage_depth_path2> ...]

The meaning of statistics are followed by Stan. In short,

mean: expected a posteriori(EAP) of the parameter
se_mean: standard error
sd: posterior standard deviation
2.5%, 25%, 50%, 75%, 97.5%: posterior percentile
n_eff: Effective sample size
Rhat: determination index of convergence.
- If it is greater than 1.1, the MCMC trial doesn't reach to convergence

Pros.

Estimate parameters from complicated model
- When you try to use mixture models, it'll be better to use MCMC algorithm
Parameter estimation range is available

Cons.

Takes a long time
Requires much memories

Optimizing mode

This mode performs optimizing of the parameters by maximizing the joint posterior. L-BFGS algorithm is used in default.

sphere_estimate -m vonmises -M optimizing <Estimated_result_path> <Coverage_depth_path1> [<Coverage_depth_path2> ...]

Only maximum likelihood estimates(MLE) were calculated.

Pros.

Fast
Low memory requirement
Give us a reasonable estimates for simple model

Cons.

Not good for complicated models
Only give us a MLE
High dependence for initialization seed
- It would be better to estimate multiple seeds. And check the variance of the parameters.

Utilities

Some of utilities for coverage depth analysis are equipped with sphere.

Coverage depth visualization

This scripts are useful if you want to know coverage depth trend before using sphere_estimate. See sphere_dplot --help for more details.

sphere_dplot <plot_path> <Coverage_depth_path>

Estimated coverage depth trend visualization

This scripts are useful if you want to know estimated depth trend after estimation. See sphere_mcplot --help for more details.

sphere_mcplot -m vonmises -M sampling <plot_path> <Coverage_depth_path> <Estimated_result_path>

Circular stats

Compute directional stats from coverage depth file. You can calculate mean resultant length of the coverage depth by this tool. See sphere_cstats --help for more details.

sphere_cstats <stats_path> <Coverage_depth_path1> [<Coverage_depth_path2> ...]

Pewsey's symmetric test

Compute stats and p-value of Pewsey's symmetric test following (Pewsey, 2002). See sphere_symtest for more details.

sphere_symtest <stats_path> <Coverage_depth_path1> [<Coverage_depth_path2> ...]

Q&A

Fitting parameter by optimizing fails, and it returns message as "LS failed, Hessian reset". What is this?

See previous issue submitted to prophet, which also uses Stan as background estimator.

For this problem, there are a few solutions.

Use Newton algorithm instead of L-BFGS algorithm. If L-BFGS is rejected, it automatically runs.
- However, it also likely to achieve local minimum.
Use different random seed in sphere_estimater comamnd. It can be set by -ss option.
Use MCMC algorithm by -M sampling option.

License

BSD-3-Clause

Name		Name	Last commit message	Last commit date
Latest commit History 401 Commits
sphere		sphere
stan		stan
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPHERE

Requirements

How to use

Install

Pre-calculation

Building index file for alignment

Aligning sequence file to the index file

Counting coverage depth

(Optional) Select coverage depthes for single genome

(Optional) Compression/Noise reduction by rolling median filter

Estimation

Sampling mode

Pros.

Cons.

Optimizing mode

Pros.

Cons.

Utilities

Coverage depth visualization

Estimated coverage depth trend visualization

Circular stats

Pewsey's symmetric test

Q&A

Fitting parameter by optimizing fails, and it returns message as "LS failed, Hessian reset". What is this?

License

About

Releases 1

Packages

Languages

License

suzuki-shm/SPHERE

Folders and files

Latest commit

History

Repository files navigation

SPHERE

Requirements

How to use

Install

Pre-calculation

Building index file for alignment

Aligning sequence file to the index file

Counting coverage depth

(Optional) Select coverage depthes for single genome

(Optional) Compression/Noise reduction by rolling median filter

Estimation

Sampling mode

Pros.

Cons.

Optimizing mode

Pros.

Cons.

Utilities

Coverage depth visualization

Estimated coverage depth trend visualization

Circular stats

Pewsey's symmetric test

Q&A

Fitting parameter by optimizing fails, and it returns message as "LS failed, Hessian reset". What is this?

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages