Skip to content

Plotting Variants from a VCF

Jon Belyeu edited this page Nov 14, 2019 · 3 revisions

A common task in SV research is reviewing a set of SV calls stored in a VCF file. To assist with this task, samplot includes samplot vcf, a tool to plot all SVs or a subset of them and create a simple JavaScript-based webpage for quick and easy variant review. This website may be kept locally or deployed on the web.

Quickstart

An example website is included in this repo and may be downloaded with the code for review. The following commands will also allow you to re-create the test site using data from the test/data directory of this repo (assumes working directory is test/data).

mkdir -p test_site
samplot vcf \
    -d test_site/ \
    --vcf test.vcf \
    --sample_ids HG002 HG003 HG004 \
    -b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam

The samplot vcf command above creates the index.html file for the new site that will be created, then runs samplot plot commands for the creation of the individual images, which are stored in test_site.

Additional info

Samplot vcf has a number of options to make analysis of large SV callsets more tractable.

Options

These options allow user control for image plotting.

  • ped: specify a ped file for the samples in the VCF. This allows samplot_vcf to label sample relatedness and enables the next option.
  • dn_only: if the ped option is used, this outputs only SVs that violate Mendelian inheritance rules (potential de novo mutations).
  • min_call_rate: only plot variants with at least this call rate (i.e. call rate of .9 means 90% of samples have a non-missing calling for this variant).
  • max_hets: only plot variants with at least this many heterozygotes in the VCF.
  • min_entries: if fewer than this many samples are HET or HOMALT for a variant, samplot_vcf will attempt to add additional HOMREF samples as a control in the plot.
  • max_entries: do not plot more than this many HET or HOMALT samples for each variant.
  • max_mb: if a variant is longer than this, ignore it. More important in the past when samplot plot couldn't plot large variants using the zoom option.
  • important_regions: if an analysis is targeted to a specific set of regions (such as genes relevant to a phenotype), use this option to specify a BED file and only output SVs within that set of regions.
  • sample_ids: Space-delimited list of sample IDs in the same order as BAM/CRAM file inputs (to associate alignment files with fields in VCF). If BAM/CRAM files contain the RG tag which identifies the sample ID, this is unnecessary.
  • manual_run: print out samplot plot commands instead of running them immediately. This allows greater control over running the commands, such as using tools such as gnu parallel

Any additional options used with samplot vcf will be passed on to samplot plot for image creation. This allows refined image customization with samplot vcf just like in samplot plot.

Customizable filters

Samplot vcf implements a simple filtering system for user-defined filters. This allows the user to filter beyond hard-coded options like those above and fit analyses to specific needs/VCF annotations.

Specify filters with the filter option. This option can be included multiple times to allow multiple filters. When multiple filters are applied thus, they are applied in a logical OR fashion; that is, if a sample at a given variant does not pass one filter, but does pass another, it will be plotted. For AND application of filters (i.e. if a sample passes one filter but not the next, it will not be plotted), combine filters with the & symbol in a single use of the filter option.

Example 1 (DEL or INV, only):

samplot vcf \
    --filter "SVTYPE == 'DEL'" \
    --filter "SVTYPE == 'INV'" \
    -d test_site/ \
    --vcf test.vcf \
    --sample_ids HG002 HG003 HG004 \
    -b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam

Example 2 (DEL with 8 supporting reads or INV with 5, only):

samplot vcf \
    --filter "SVTYPE == 'DEL' & SU >= 8" \
    --filter "SVTYPE == 'INV' & SU >= 5" \
    -d test_site/ \
    --vcf test.vcf \
    --sample_ids HG002 HG003 HG004 \
    -b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam

Example 3 (passing on the zoom parameter with size 1000 bp):

samplot vcf\
    --zoom 1000 \
    -d test_site/ \
    --vcf test.vcf \
    --sample_ids HG002 HG003 HG004 \
    -b HG002_Illumina.bam HG003_Illumina.bam HG004_Illumina.bam