Skip to content

Visualizing Linkage Disequilibrium using 1000 Genomes Data

Notifications You must be signed in to change notification settings

joe-nas/haploplotR

Repository files navigation

haploplotR: Project to visualize linkage disequilibrium from 1000 genomes data

rs2235371

Example plot using the IRF6 locus. Red bars in the population panels represent tag snps, snps wich are in linkage disequilibrium (rsquared >= .8) with lead snps indicated by blue bars. The p63BS panel indicates p63 binding sites which overlap with lead or tag snps (red) in at least one population and therefore are at least in part in linkage disequilibrium with the lead snp. Or as indicated by black bars are not in linkage disequilibrium with a lead or tag snp.

For this particular locus lead snps are snps indicated by gwas focusing on non syndromic cleft lip with or without cleft palate (NSCL/P). Given that gwas only indicates a genomic region, further study of that region needs to be done. By analysing linkage disequilibrium in that region and bringing it into context with p63, a tf regulating facial development one might find regulatory regions like enhancers being involved in NSCL/P.

requirements:

How to:

First a few dependencies and useful objects need to be loaded

#loading dependencies
library(devtools)
library(GenomicRanges)
library(grid)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(Homo.sapiens)
devtools::load_all("path_to/haploplotR")

Next we specify 1000genomes vcf files and panel file paths. Furthermore target snps and populations are defined.

# specify 1000genomes vcf files and panel file paths
vcffile<-"path_to/ALL.%s.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz"
panelfile<-"path_to/integrated_call_samples_v3.20130502.ALL.panel"
 
# create vectors defining lead_snps as well as 1000genomes population identifiers

lead_snps<-c("rs79997038","rs79084855","rs560426")
populations <- c("CEU",'CHB',"LWK")

In order to analyse linkage disequilibrium only in a window around our lead_snp we need to define a window around lead_snps. Here we take the lead_snps identifiers and create a GRanges object from it and subsequently resize the genomic position to a window. For compatibility reason we set the seqlevelsStyle to "UCSC"

lead_snps_gr<-GRanges(snpsById(SNPlocs.Hsapiens.dbSNP144.GRCh37,lead_snps))
analysis_intervals_gr<-resize(lead_snps_gr,fix='center',width=6.5e5)
seqlevelsStyle(lead_snps_gr)<-"UCSC"
seqlevelsStyle(analysis_intervals_gr)<-"UCSC"

Next we put our ingredients into the LDABase, LDAImport and LDAnalysis1 objects. Here for each of the analysis intervals and populations which are defined prior we import our data using the LDAImport object and by applying the $new() method. Subsequently, LDAnalysis1$new() is called. Here, we input the vector of lead_snps, define an rsquared cutoff which defaults to .8 and use our data as imported by LDAImport. We generate results using the LDAnlaysis1$set_rsquared() and set_results() methods to calculate and filter which SNPs in the analysis windows are in linkage disequilibrium with the lead_snps. by meeting the cutoff value. Here the set_rsquared() method calculates all rsquared values for lead snps with all other snps in the analysis window. set_results() filters the rsquared data to meet the cutoff value.

ldabase <- LDABase$new()
# workflow for only one interval/ single locus
dat <- LDAImport$new(ldabase = ldabase, granges = analysis_intervals_gr[1], 
                     populations = populations)$set_data()
res <- LDAnalysis1$new(lead_snps = lead_snps, cutoff = 0.8, lda_import = dat)
res$set_rsquared()
res$set_results()

The results can then be accessed by using res$results. res$results holds a Granges object containing the positions of tag_snps annotated with:

  • lead_snp_name
  • tag_snp_name
  • rsquared
  • population
  • lead_snp_pos

To do:

  • import1000GData.R needs a rewrite using VariantAnnotation as WhopGenome is no longer maintained
  • expand the how to, elaborate on how to visualize the data
  • tidy up roxygen
  • tidy up or remove:
    • obsolete/analysis.R
    • obsolete/liloanalysis.R
    • obsolete/testscript.R
    • obsolete/LDAnalysis_class.R

About

Visualizing Linkage Disequilibrium using 1000 Genomes Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published