Skip to content

Latest commit

 

History

History
74 lines (48 loc) · 7.33 KB

workshop_details.md

File metadata and controls

74 lines (48 loc) · 7.33 KB

Descriptions of the workshops and talks

Saturday

Introduction to HGVS Nomenclature and the Python hgvs package

Reece Hart

The HGVS sequence variant nomenclature is a set of recommendations for presenting biological sequence variants to humans. Unfortunately, humans aren’t very good at distinguishing formats that are convenient for them from representations that are convenient for computers. As a result, humans put HGVS variant strings into databases and web pages and clinical reports, making it difficult to compute on these variants. The Python hgvs package (Apache 2.0 license) parses, formats, validates, and shifts/normalizes variants, and projects (maps) variants between aligned sequences. An important distinguishing feature of the hgvs package is that it correctly handles cases in which the genomic and transcript sequences differ by substitutions or indels.

This workshop will introduce the hgvs package and help attendees get started with using it. To get the most out of the workshop, attendees should bring a laptop with Python installed or the ability to run docker containers.

Related: hgvs-eval project

Estimated time : ~45 minutes | Draft Slides | Pre-workshop Prep

Gitting Git: Beginner workshop with Git, Github

Amanjeev Sethi

Plan is to host a workshop where I talk about Git’s beginner’s basics, and exercise(s) to follow through together to get hands-on experience with most used commands.

Assumptions/prerequisites:

  • All participants have Git installed on their machines
  • All participants have a Github account
  • All participants have command line access to Git
  • All participants have preferably SSH keys setup

Helpful links: Git installation, Github website (create account), setting up ssh key if you do not want to type Github password again and again, see

Estimated time : ~60 minutes Workshop resources: Slides and Github repo

Sunday

Bioinformatic solutions for variant calling within segmental duplications

Dan Kvitek

Segmental duplications (SegDups) are long DNA stretches of nearly identical sequence that occur two or more times in the genome, arising from duplication events that appear fixed in the population. SegDups complicate variant calling from short-read next-generation sequencing (NGS) data because sequence reads cannot be unambiguously aligned due to the shared sequence identity between SegDup regions. This makes clinical-grade variant calls in these regions difficult or impossible using NGS data and current variant identification approaches. We have developed a general method for identifying SNPs, indels and CNVs within SegDups. The method consists of two steps. The first step is a bioinformatics screen using NGS data in which sequence reads derived from both the clinical gene of interest and the SegDup are analyzed for the presence of variants using a reference sequence with the gene of interest present but the SegDup(s) masked. If no variants of interest are identified in the screen, no further work is required, and the sample is negative for variants across that gene. If variants of interest are identified, the variant location is disambiguated with additional laboratory assays such as long-range PCR or MLPA, if necessary. This method eliminates the need to perform additional laboratory assays on screen-negative samples, significantly reducing the time and cost of variant identification, and enables identification in parts of the genome previously impossible. We have applied this method successfully to several genes, including the Lynch Syndrome gene PMS2 and its pseudogene PMS2CL, an internal 8 exon triplication of the myopathy gene NEB and the spinal muscular atrophy genes SMN1/2. We are in the process of applying this method to other genes of clinical importance that have SegDups elsewhere in the genome. Among them are the autosomal dominant polycystic kidney disease gene PKD1, the Shwachman-Diamond syndrome gene SBDS, the incontinentia pigmenti gene IKBKG (NEMO), the Koolen-de Vries Syndrome gene KANSL1, and the Gaucher disease gene GBA. In the talk, I would go through the problem with SegDups, show the general bioinformatic method we developed, then show specific examples of genes, their read alignment issues, and the solution.

Estimated time: 30-60 minutes Slides: Available later

Making R packages accessible to non-programmer collaborators using the VisRseq platform

Hamid Yournesy

The goal of this workshop is to introduce the VisRseq, a software platform for analysis and visualization of NGS data. We will walk through the quick process of creating modules called R-apps from their R packages. This should be of most interest to bioinformaticians and package developers that develop R-based analysis tools and would like to make them accessible to their non-programmer collaborators or to the public without having to spend time on creating extensive graphical user interfaces. We will create diverse types of apps, from simple plotting (e.g. ggplot) to intermediate (e.g. clustering) to more advanced (e.g. edgeR and DEseq) packages. We will also see how several R-apps can be linked together to create more complex workflows. Participants will require having beginner knowledge of R and a machine with R and Java installation

Estimated time: 90-120 minutes Workshop resources: Slides and workshop materials

Monday

Can My Computer Read Those Papers For Me?

Jake Lever

How easy is it to build a new knowledgebase from scratch for a particular biological problem? I'll explore why you might want to do this and what technology exists for this problem. I'll focus on our approach using the VERSE tool and the concepts behind it.

Estimated time: 20-30 minutes Slides: Google Slides

The ENCODE Encyclopedia: The Ultimate User's Guide

Jill E. Moore

The Encyclopedia of DNA Elements (ENCODE) Consortium has generated hundreds of high throughput genomic datasets with the goal of cataloging functional elements in the human genome. Our goal was to integrate these complex data types to create an Encyclopedia that can be used by the wider research community. This workshop will cover components of the Encyclopedia from ground level annotations such as gene expression and transcription factor binding sites, to middle and top level annotations such as candidate enhancers and their linked genes. The workshop will include a hands-on tutorial where users can search for regions of interest, investigate regulatory elements, and visualize these regions on the genome browser.

Estimated time: ~30 minutes Slides: Available later