Skip to content

Commit

Permalink
doc updates and Excel output fix
Browse files Browse the repository at this point in the history
  • Loading branch information
sigven committed May 27, 2024
1 parent ff08536 commit 7c1d904
Show file tree
Hide file tree
Showing 5 changed files with 29 additions and 18 deletions.
4 changes: 2 additions & 2 deletions R/main.R
Original file line number Diff line number Diff line change
Expand Up @@ -383,8 +383,6 @@ write_cpsr_output <- function(report,
workbook <- openxlsx2::wb_workbook() |>
openxlsx2::wb_add_worksheet(sheet = "VIRTUAL_PANEL") |>
openxlsx2::wb_add_worksheet(sheet = "CLASSIFICATION") |>
openxlsx2::wb_add_worksheet(sheet = "BIOMARKER_EVIDENCE") |>
openxlsx2::wb_add_worksheet(sheet = "SECONDARY_FINDINGS") |>
openxlsx2::wb_add_data_table(
sheet = "CLASSIFICATION",
x = dplyr::select(
Expand Down Expand Up @@ -415,6 +413,7 @@ write_cpsr_output <- function(report,

if(NROW(report[["content"]]$snv_indel$callset$variant$sf) > 0){
workbook <- workbook |>
openxlsx2::wb_add_worksheet(sheet = "SECONDARY_FINDINGS") |>
openxlsx2::wb_add_data_table(
sheet = "SECONDARY_FINDINGS",
x = dplyr::select(
Expand All @@ -434,6 +433,7 @@ write_cpsr_output <- function(report,

if(NROW(report$content$snv_indel$callset$variant$bm) > 0){
workbook <- workbook |>
openxlsx2::wb_add_worksheet(sheet = "BIOMARKER_EVIDENCE") |>
openxlsx2::wb_add_data_table(
sheet = "BIOMARKER_EVIDENCE",
x = dplyr::select(
Expand Down
8 changes: 4 additions & 4 deletions pkgdown/index.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<br>

## Cancer Predisposition Sequencing Reporter <a href="https://sigven.github.io/cpsr/"><img src="man/figures/logo.png" align="right" height="118" width="100"/></a>
## Cancer Predisposition Sequencing Reporter <a href="https://sigven.github.io/cpsr/"><img src="man/figures/logo.png" align="right" height="106" width="90"/></a>

<br>
<br><br>

The *Cancer Predisposition Sequencing Reporter (CPSR)* is a computational workflow that **interprets germline variants** identified from next-generation sequencing **in the context of cancer predisposition**.

*CPSR* accepts a query file with raw germline variant calls (SNVs/InDels) from a single sample (cancer patient), encoded in the [VCF format ](https://samtools.github.io/hts-specs/VCFv4.2.pdf). CPSR conducts comprehensive gene and variant annotation on the input calls, and generates a dedicated _variant HTML report_, that provides the following main functionality:

1) Flexible **selection of cancer predisposition genes** subject to analysis
2) **Variant classification** (*Pathogenic* to _Benign_) according to published guidelines (ACMG/AMP)
2) **Variant classification** (*Pathogenic* to _Benign_) through implementation of ACMG guidelines
3) **Biomarker matching** of sample variants (prognosis, diagnosis, drug sensitivity/resistance)
4) Potential **secondary/incidental findings** (ACMG recommendations)
4) Reporting of **secondary/incidental findings** (ACMG recommendations)


The workflow is integrated with the framework that underlies [Personal Cancer Genome Reporter - PCGR ](https://github.com/sigven/pcgr). While *PCGR* is intended for reporting and analysis of somatic variants detected in a tumor, *CPSR* is intended for reporting and ranking of germline variants in protein-coding genes that are implicated in cancer predisposition and inherited cancer syndromes.
Expand Down
27 changes: 19 additions & 8 deletions vignettes/output.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The report is structured in multiple sections, described briefly below:
* Summarizes the main findings in the sample through value boxes

3. __Variant classification__
* For all coding variants in the selected cancer predisposition geneset, interactive variant tables are shown for each level (__ClinVar__ and __non-ClinVar (Other)__ variants combined):
* For all coding variants in the selected cancer predisposition geneset, interactive variant tables are shown for each level of clinical significance (__ClinVar__ and __non-ClinVar (Other)__ variants combined):
* Pathogenic
* Likely Pathogenic
* Variants of Uncertain Significance (VUS)
Expand All @@ -31,7 +31,7 @@ The report is structured in multiple sections, described briefly below:

4. __Genomic biomarkers__
* Reported clinical evidence items from [CIViC](https://civicdb.org) that match with variants in the query set are reported in four distinct tabs (Predictive / Prognostic / Diagnostic / Predisposing)
- See section below for [details of biomarker annotations]()
- See section below for [details of biomarker annotations](#biomarker-annotations)

5. __Secondary findings__
* Pathogenic variants in the [ACMG recommended list of genes for report of secondary/incidental findings](https://www.ncbi.nlm.nih.gov/clinvar/docs/acmg/)
Expand All @@ -50,6 +50,8 @@ The report is structured in multiple sections, described briefly below:
8. __References__
* Supporting scientific literature - knowledge resources, guideline references etc.)

<br><br>

### Variant call format - VCF

A VCF file containing annotated, germline calls (single nucleotide variants and insertions/deletions) is generated with the following naming convention:
Expand Down Expand Up @@ -143,7 +145,7 @@ A VCF file containing annotated, germline calls (single nucleotide variants and
| `REFSEQ_PROTEIN_ID` | RefSeq protein/peptide identifier for VEP's picked transcript (*NP_XXXXXX*) |
| `TRANSCRIPT_MANE_SELECT` | MANE select transcript identifer: one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene |
| `TRANSCRIPT_MANE_PLUS_CLINICAL` | transcripts chosen to supplement MANE Select when needed for clinical variant reporting |
| `GENCODE_TAG` | tag for gencode transcript (basic etc) |
| `GENCODE_TAG` | tag for GENCODE transcript (basic etc) |
| `GENCODE_TRANSCRIPT_TYPE` | type of transcript (protein-coding etc.) |
| `TSG` | Indicates whether gene is predicted as a tumor suppressor gene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource |
| `TSG_SUPPORT` | Underlying evidence for gene being a tumor suppressor. Format: `CGC_TIER<1/2>&NCG&CancerMine:num_citations"` |
Expand All @@ -154,7 +156,7 @@ A VCF file containing annotated, germline calls (single nucleotide variants and
| `CGC_SOMATIC` | Member of Cancer Gene Census - somatic set |
| `CGC_TIER` | Cancer Gene Census tier (1/2) |
| `NCG_DRIVER` | Cancer driver gene prediction by Network of Cancer Genes (NCG) |
| `INTOGEN_DRIVER` | Indicates whether gene is predicted as cancer driver from IntoGen's cancer driver prediction algorithm |
| `INTOGEN_DRIVER` | Indicates whether gene is predicted as cancer driver from IntOGen's cancer driver prediction algorithm |
| `PROB_EXAC_LOF_INTOLERANT` | `dbNSFP_gene`: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 data |
| `PROB_EXAC_LOF_INTOLERANT_HOM` | `dbNSFP_gene`: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 data |
| `PROB_EXAC_LOF_TOLERANT_NULL` | `dbNSFP_gene`: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 data |
Expand Down Expand Up @@ -276,8 +278,13 @@ A VCF file containing annotated, germline calls (single nucleotide variants and

##### _Variant/genotype information_

| 1. `GENOTYPE` | Variant genotype (*het*/*hom_ref*/*hom_alt*) |
| 2. `DP_CONTROL` | Sequencing depth at variant site ('DP')|
| Tag | Description |
|-----|-------------|
| `GENOTYPE` | Variant genotype (*het*/*hom_ref*/*hom_alt*) |
| `DP_CONTROL` | Sequencing depth at variant site ('DP')|


<br><br>

### Excel workbook - XLSX

Expand All @@ -292,6 +299,8 @@ The four sheets of the workbook contains the following:
- *BIOMARKER_EVIDENCE* - matches of variants with genomic biomarkers
- *SECONDARY_FINDINGS* - potential secondary findings

<br><br>

### Tab-separated values - TSV

We provide a compressed tab-separated values file with variant classifications and the most essential variant/gene annotations. The file has the following naming convention:
Expand Down Expand Up @@ -359,7 +368,7 @@ The following variables are included in the tiered TSV file (VCF tags in the que
| 53. `N_INSILICO_SPLICING_NEUTRAL` | Number of algorithms with splicing neutral prediction from dbscSNV |
| 54. `N_INSILICO_SPLICING_AFFECTED` | Number of algorithms with splicing affected prediction from dbscSNV |
| 55. `gnomADe_AF` | Global MAF in gnomAD (exome samples) |
| 56. `FINAL_CLASSIFICATION` | Final variant classification based on the combination of `CLINVAR_CLASSIFICTION` (for ClinVar-classified variants), and `CPSR_CLASSIFICATION` (for novel variants) |
| 56. `FINAL_CLASSIFICATION` | Final variant classification, using either `CLINVAR_CLASSIFICATION` if variant is ClinVar-classified, or `CPSR_CLASSIFICATION` for novel variants |
| 57. `CPSR_CLASSIFICATION` | Variant clinical significance by CPSR's classification algorithm (P/LP/VUS/LB/B) |
| 58. `CPSR_PATHOGENICITY_SCORE` | Aggregated pathogenicity score by CPSR's algorithm |
| 59. `CPSR_CLASSIFICATION_CODE` | Combination of CPSR classification codes assigned to the variant (ACMG) |
Expand All @@ -368,6 +377,8 @@ The following variables are included in the tiered TSV file (VCF tags in the que

**NOTE**: The user has the possibility to append the TSV file with data from other INFO tags in the input VCF (i.e. using the *--retained_info_tags* option)

<br><br>

### Biomarker annotations

The interactive HTML report (section *Genomic biomarkers*) and the Excel workbook (sheet *BIOMARKER_EVIDENCE* contains information on matches between potential pathogenic/likely pathogenic sample variants and reported biomarkers, the latter referring to clinical evidence items that relate genomic genomic aberrations to prognosis, diagnosis or sensitivity/resistance to particular treatments. All biomarker annotations are prefixed with **BM_**, and the following is provided per evidence item:
Expand All @@ -378,7 +389,7 @@ The interactive HTML report (section *Genomic biomarkers*) and the Excel workboo
| 2. `BM_DISEASE_ONTOLOGY_ID` | Disease ontology id for cancer type - from CIViC |
| 3. `BM_PRIMARY_SITE` | Primary tumor type of cancer type - mapped with [phenOncoX](https://github.com/sigven/phenOncoX) |
| 4. `BM_CLINICAL_SIGNIFICANCE` | Clinical significance of biomarker (drug sensitivity, drug resistance, poor outcome etc.) - from CIViC |
| 5. `BM_THERAPEUTIC_CONTEXT` | Cancer drugs associated with biomarker (for biomarkers related to drug sensitivity/reistance) - from CIViC |
| 5. `BM_THERAPEUTIC_CONTEXT` | Cancer drugs associated with biomarker (for biomarkers related to drug sensitivity/resistance) - from CIViC |
| 6. `BM_CITATION` | Reference/source for biomarker - i.e. publication or guidelines - from CIViC |
| 7. `BM_RATING` | Rating of biomarker - from CIViC |
| 8. `BM_MOLECULAR_PROFILE_NAME` | Associated name of molecular profile - i.e. "BRCA mutation" - from CIViC |
Expand Down
2 changes: 1 addition & 1 deletion vignettes/running.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ $ (base) conda activate pcgr
$ (pcgr)
cpsr \
--input_vcf ~/cpsr-1.2/example.vcf.gz \
--vep_dir ~/.vep
--vep_dir ~/.vep \
--refdata_dir ~/pcgr_ref_data \
--output_dir ~/cpsr-1.2 \
--genome_assembly grch37 \
Expand Down
6 changes: 3 additions & 3 deletions vignettes/virtual_panels.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ The cancer predisposition report can show variants found in a number of well-kno

* **Panel 0** is a non-conservative, research-based _superpanel_ assembled through multiple sources on cancer predisposition genes:
* A list of 152 genes that were curated and established within TCGA’s pan-cancer study ([Huang et al., *Cell*, 2018](https://www.ncbi.nlm.nih.gov/pubmed/29625052))
* A list of 114 protein-coding genes that has been manually curated in COSMIC’s [Cancer Gene Census v99](https://cancer.sanger.ac.uk/census),
* A list of 113 protein-coding genes that has been manually curated in COSMIC’s [Cancer Gene Census v100](https://cancer.sanger.ac.uk/census),
* Genes from all [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk/) panels for inherited cancers and tumor syndromes, as well as DNA repair genes (detailed below)
* Additional genes deemed relevant for cancer predisposition (i.e. contributed by CPSR users)


The combination of the above sources resulted in a non-redundant set of **n = 563**
The combination of the above sources resulted in a non-redundant set of **n = 562**
genes of relevance for cancer predisposition (see complete details [below](#panel-0))

Data with respect to mechanisms of inheritance (<i>MoI</i> - autosomal recessive (AR) vs. autosomal
Expand Down Expand Up @@ -73,7 +73,7 @@ The cancer predisposition report can show variants found in a number of well-kno

## Panel 0

[Download the complete set of CPSR superpanel genes, grch37/grch38 versions (xlsx)](https://sigven.github.io/cpsr/cpsr_superpanel_2024_03.xlsx)
[Download the complete set of CPSR superpanel genes, grch37/grch38 versions (xlsx)](https://sigven.github.io/cpsr/cpsr_superpanel_2024_05.xlsx)


<!--
Expand Down

0 comments on commit 7c1d904

Please sign in to comment.