Scripts of the pipeline implemented in the MPhil research project

MPhil in Genomic Medicine 2019-2020

University of Cambridge

Research Project Title:

Exploring Cancer Predisposition due to Mismatch Repair and Hereditary Diffuse Gastric Cancer-predisposing genes utilizing the 100,000 Genomes Project dataset

Study Aims:

The identification of germline variants predisposing to cancer plays an essential role in exploring the molecular pathogenesis of tumours and is a critical step towards the clinical management of affected families.

The aim of this association analysis was to explore the 100K Genomes Project dataset for possible associations of germline variants in the selected cancer-associated pathways. The implemented methods includes Burden and Variance-componenet analysis of aggregated variants per gene and aggregated analysis of MMR pathway and HDGC-associated genes. The 100K GP dataset provided a considerable amount of data and an equipped environment for the research purpose.

The selected cancer-associated pathways were as follows:

MMR pathway: The heterozygous pathogenic variants in MMR genes (MLH1, MSH2, MSH6, PMS2, MSH3, PMS1 and EPCAM) predispose to Lynch Syndrome.
HDGC-related genes: Hereditary diffuse gastric cancer (HDGC) is an autosomal dominant cancer predisposition syndrome which is associated with pathogenic variants of CDH1, CTNNA1, MAP3K6, MYD88 genes.

Study objectives:

Explore the Genomics England (GEL) research environment.
Extract data from the latest main programme data release in LabKey server available in the Research Environment.
Select the sample cohort across multiple somatic cancers and cancer predisposition syndromes from the 100K GP participants.
Identify germline variants in the selected genes (MMR genes and HDGC-predisposing genes) across the selected cohorts.
Implement a range of statistical analysis techniques to analyse variants aggregated per gene and per whole pathway analysis.
Confirm previously established associations of genes and pathways with related types of cancer.
Examine if associations of each gene/pathway could be established with other types of cancer.

The main steps in the pipeline are to:

R import: Import, explore and prepare participants data from LabKey
Sample selection: Select cases and controls
Genes selection: Identify gene coordinates
VCF preparation: Extract sequencing data from gVCF
Annotation: Ensembl VEP, CADD and ClinVar variants annotation
QC and filtering: QC checks and functional filtering of variants
Data consolidation: VCF and phenotypic data into R and additioal genotype filtering
SKAT preparation: prepare files required for SKAT library
Association analysis: Test for associations using SKAT library

The repository was set-up to support the MPhil research project through providing the details of the project pipeline. The repository can be used by the public and it is adapted to the Genomics England Research Environment.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
s01_import_Labkey_tables		s01_import_Labkey_tables
s02_explore_labkey_data		s02_explore_labkey_data
s03_select_sample		s03_select_sample
s04_genes_coordinates		s04_genes_coordinates
s05_make_vcf		s05_make_vcf
s06_annotate_vcf		s06_annotate_vcf
s07_variants_qc_and_filter		s07_variants_qc_and_filter
s08_consolidate_dataset		s08_consolidate_dataset
s09_sporadic_colon_cancer		s09_sporadic_colon_cancer
.gitignore		.gitignore
Numbers progress.xlsx		Numbers progress.xlsx
README.md		README.md
Results table draft.xlsx		Results table draft.xlsx
variants_166.xlsx		variants_166.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scripts of the pipeline implemented in the MPhil research project

MPhil in Genomic Medicine 2019-2020

University of Cambridge

Research Project Title:

Study Aims:

Study objectives:

The main steps in the pipeline are to:

About

Releases

Packages

Languages

rofaida-desoki/MPhil-Genomic-Medicine

Folders and files

Latest commit

History

Repository files navigation

Scripts of the pipeline implemented in the MPhil research project

MPhil in Genomic Medicine 2019-2020

University of Cambridge

Research Project Title:

Study Aims:

Study objectives:

The main steps in the pipeline are to:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages