Skip to content

Latest commit

 

History

History

bismark_coverage

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Bismark Coverage Curves

Bismark is a tool used for aligning Bisfulfite-Sequencing libraries, giving information about DNA methylation.

Amongst other things, Bismark can generate coverage reports which state the number of observations made of each Cytosine. This script takes these reports as input and plots the proportion of cytosines seen at increasing levels of fold coverage.

This is useful as when analysing BS-Seq data it's important to set a coverage threshold to avoid low observations skewing percentage information. These plots help to choose an appropriate cut-off.

Additional options allow you to interrogate coverage on different reference strands and within regions of interest, as specified by a BED file.

Example Output

Bismark Coverage Curves Plot

See additional text output

Usage

perl bismark_coverage_curves.pl <coverage_file.cov>

For nicer fonts, download the OpenSans-Regular.ttf font into the same directory as the script. Font is from Google Fonts.

Parameters

This script is run on the command line. The following commands control how it runs.

Command Line Flag Description
--regions <regions.bed> Default: None
Supply a BED file with regions of interest. The script will show coverage inside and outside these regions
--stranded Default: No.
Split the report up into forward and reverse strands
--min_cov Default: 0x.
The minimum coverage limit to consider / plot
--max_cov Default: 15x; 50x if --regions is set.
The maximum coverage limit to consider / plot
--binsize Default: 1.
The coverage bin size to use - what size steps to use between --min_cov and --max_cov
--numlines Default: 1000000.
Number of lines to process. More lines gives more accuracy but takes longer to run. Note: if the imput is sorted and your sample biased it's a good idea to specify a large number.
--append Default: _coverageStats.txt.
String to append to results filenames
--quiet Suppress status messages
--help Print help message

Dependencies

The script is written in Perl and run on the command line. The following core Perl modules are required for generating the numbers:

To plot the graphs, you'll also need the following modules:


Bismark Window Sizes

Bismark is a tool used for aligning Bisfulfite-Sequencing libraries, giving information about DNA methylation.

In addition to setting coverage thresholds for individual cytosines, it can help to set thresholds for the number of different cytosines to be counted within each window. This script takes coverage reports as input and plots the percentage of windows retained at increasing window sizes.

Additional options allow you to restrict included cytosines to a specific reference strand, define a coverage threshold for each cytosine for it to be considered, the number of different cytosines passing the coverage threshold for a window to be counted as well as restricting the windows to those overlapping regions of interest, as specified by a BED file.

Example Output

Bismark Window Sizes Plot

Bismark Window Sizes Plot

See additional text output: first plot, second plot

Usage

perl bismark_window_sizes.pl <coverage_file.cov>

For nicer fonts, download the OpenSans-Regular.ttf font into the same directory as the script. Font is from Google Fonts.

Parameters

This script is run on the command line. The following commands control how it runs.

Command Line Flag Description
--regions <regions.bed> Default: None
Supply a BED file with regions of interest. Only reads and windows overlapping these regions will be considered.
--stranded <for / rev> Default: both.
Consider reads on only one reference strand
--coverage Default: 10x.
Minumum number of observations required to count a Cytosine
--min_counts <comma separated integers> Default: 1,2,3,4,5,10.
List of count thresholds to use - how many different cytosines must be seen within a window for it to pass
--window_sizes <comma separated integers, bp> Default: 100bp,200bp,300bp,400bp,500bp,1kbp,1.5kbp,2kbp,3kbp,4kbp,5kbp,10kbp,20kbp,30kbp,40kbp,50kbp,100kbp,200kbp,300kbp,400kbp,500kbp,1mbp,2mbp.
Window sizes to use. Specify in base pairs.
--append Default: _coverageStats.txt.
String to append to results filenames
--quiet Suppress status messages
--help Print help message

Dependencies

The script is written in Perl and run on the command line. The following core Perl modules are required for generating the numbers:

To plot the graphs, you'll also need the following modules:

Credits

These scripts were written for use at the National Genomics Infrastructure at SciLifeLab in Stockholm, Sweden. They are part of a larger repository of NGI Visualization Scripts.

For more information, please get in touch with Phil Ewels.