-
Notifications
You must be signed in to change notification settings - Fork 1
Plotting Variants from a VCF
A common task in SV research is reviewing a set of SV calls stored in a VCF file. To assist with this task, samplot
includes samplot vcf
, a tool to plot all SVs or a subset of them and create a simple JavaScript-based webpage for quick and easy variant review. This website may be kept locally or deployed on the web.
An example website is included in this repo and may be downloaded with the code for review. The following commands will also allow you to re-create the test site using data from the test/data
directory of this repo (assumes working directory is test/data
).
mkdir -p test_site
samplot vcf \
-d test_site/ \
--vcf test.vcf \
--sample_ids HG002 HG003 HG004 \
-b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam
The samplot vcf
command above creates the index.html
file for the new site that will be created, then runs samplot plot
commands for the creation of the individual images, which are stored in test_site
.
Samplot vcf
has a number of options to make analysis of large SV callsets more tractable.
These options allow user control for image plotting.
- ped: specify a ped file for the samples in the VCF. This allows samplot_vcf to label sample relatedness and enables the next option.
- dn_only: if the
ped
option is used, this outputs only SVs that violate Mendelian inheritance rules (potential de novo mutations). - min_call_rate: only plot variants with at least this call rate (i.e. call rate of .9 means 90% of samples have a non-missing calling for this variant).
- max_hets: only plot variants with at least this many heterozygotes in the VCF.
- min_entries: if fewer than this many samples are HET or HOMALT for a variant, samplot_vcf will attempt to add additional HOMREF samples as a control in the plot.
- max_entries: do not plot more than this many HET or HOMALT samples for each variant.
- max_mb: if a variant is longer than this, ignore it. More important in the past when
samplot plot
couldn't plot large variants using the zoom option. - important_regions: if an analysis is targeted to a specific set of regions (such as genes relevant to a phenotype), use this option to specify a BED file and only output SVs within that set of regions.
- sample_ids: Space-delimited list of sample IDs in the same order as BAM/CRAM file inputs (to associate alignment files with fields in VCF). If BAM/CRAM files contain the
RG
tag which identifies the sample ID, this is unnecessary. - manual_run: print out
samplot plot
commands instead of running them immediately. This allows greater control over running the commands, such as using tools such as gnu parallel
Any additional options used with samplot vcf
will be passed on to samplot plot
for image creation. This allows refined image customization with samplot vcf
just like in samplot plot
.
Samplot vcf
implements a simple filtering system for user-defined filters. This allows the user to filter beyond hard-coded options like those above and fit analyses to specific needs/VCF annotations.
Specify filters with the filter
option. This option can be included multiple times to allow multiple filters.
When multiple filters are applied thus, they are applied in a logical OR
fashion; that is, if a sample at a given variant does not pass one filter, but does pass another, it will be plotted. For AND
application of filters (i.e. if a sample passes one filter but not the next, it will not be plotted), combine filters with the &
symbol in a single use of the filter
option.
Example 1 (DEL or INV, only):
samplot vcf \
--filter "SVTYPE == 'DEL'" \
--filter "SVTYPE == 'INV'" \
-d test_site/ \
--vcf test.vcf \
--sample_ids HG002 HG003 HG004 \
-b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam
Example 2 (DEL with 8 supporting reads or INV with 5, only):
samplot vcf \
--filter "SVTYPE == 'DEL' & SU >= 8" \
--filter "SVTYPE == 'INV' & SU >= 5" \
-d test_site/ \
--vcf test.vcf \
--sample_ids HG002 HG003 HG004 \
-b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam
Example 3 (passing on the zoom
parameter with size 1000 bp):
samplot vcf\
--zoom 1000 \
-d test_site/ \
--vcf test.vcf \
--sample_ids HG002 HG003 HG004 \
-b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam