doc updates and Excel output fix

sigven · May 27, 2024 · 7c1d904 · 7c1d904
1 parent ff08536
commit 7c1d904
Show file tree

Hide file tree

Showing 5 changed files with 29 additions and 18 deletions.
diff --git a/R/main.R b/R/main.R
@@ -383,8 +383,6 @@ write_cpsr_output <- function(report,
     workbook <- openxlsx2::wb_workbook() |>
       openxlsx2::wb_add_worksheet(sheet = "VIRTUAL_PANEL") |>
       openxlsx2::wb_add_worksheet(sheet = "CLASSIFICATION") |>
-      openxlsx2::wb_add_worksheet(sheet = "BIOMARKER_EVIDENCE") |>
-      openxlsx2::wb_add_worksheet(sheet = "SECONDARY_FINDINGS") |>
       openxlsx2::wb_add_data_table(
         sheet = "CLASSIFICATION",
         x = dplyr::select(
@@ -415,6 +413,7 @@ write_cpsr_output <- function(report,
 
     if(NROW(report[["content"]]$snv_indel$callset$variant$sf) > 0){
       workbook <- workbook |>
+        openxlsx2::wb_add_worksheet(sheet = "SECONDARY_FINDINGS") |>
         openxlsx2::wb_add_data_table(
         sheet = "SECONDARY_FINDINGS",
         x = dplyr::select(
@@ -434,6 +433,7 @@ write_cpsr_output <- function(report,
 
     if(NROW(report$content$snv_indel$callset$variant$bm) > 0){
       workbook <- workbook |>
+        openxlsx2::wb_add_worksheet(sheet = "BIOMARKER_EVIDENCE") |>
         openxlsx2::wb_add_data_table(
           sheet = "BIOMARKER_EVIDENCE",
           x = dplyr::select(

diff --git a/pkgdown/index.md b/pkgdown/index.md
@@ -1,17 +1,17 @@
 <br>
 
-## Cancer Predisposition Sequencing Reporter <a href="https://sigven.github.io/cpsr/"><img src="man/figures/logo.png" align="right" height="118" width="100"/></a>
+## Cancer Predisposition Sequencing Reporter <a href="https://sigven.github.io/cpsr/"><img src="man/figures/logo.png" align="right" height="106" width="90"/></a>
 
-<br>
+<br><br>
 
 The *Cancer Predisposition Sequencing Reporter (CPSR)* is a computational workflow that **interprets germline variants** identified from next-generation sequencing **in the context of cancer predisposition**. 
 
 *CPSR* accepts a query file with raw germline variant calls (SNVs/InDels) from a single sample (cancer patient), encoded in the [VCF format ](https://samtools.github.io/hts-specs/VCFv4.2.pdf). CPSR conducts comprehensive gene and variant annotation on the input calls, and generates a dedicated _variant HTML report_, that provides the following main functionality:
 
 1) Flexible **selection of cancer predisposition genes** subject to analysis
-2) **Variant classification** (*Pathogenic* to _Benign_) according to published guidelines (ACMG/AMP)
+2) **Variant classification** (*Pathogenic* to _Benign_) through implementation of ACMG guidelines
 3) **Biomarker matching** of sample variants (prognosis, diagnosis, drug sensitivity/resistance)
-4) Potential **secondary/incidental findings** (ACMG recommendations)
+4) Reporting of **secondary/incidental findings** (ACMG recommendations)
 
 
 The workflow is integrated with the framework that underlies [Personal Cancer Genome Reporter - PCGR ](https://github.com/sigven/pcgr). While *PCGR* is intended for reporting and analysis of somatic variants detected in a tumor, *CPSR* is intended for reporting and ranking of germline variants in protein-coding genes that are implicated in cancer predisposition and inherited cancer syndromes.

diff --git a/vignettes/output.Rmd b/vignettes/output.Rmd
@@ -22,7 +22,7 @@ The report is structured in multiple sections, described briefly below:
      * Summarizes the main findings in the sample through value boxes
 
   3. __Variant classification__
-     * For all coding variants in the selected cancer predisposition geneset, interactive variant tables are shown for each level (__ClinVar__ and __non-ClinVar (Other)__ variants combined):
+     * For all coding variants in the selected cancer predisposition geneset, interactive variant tables are shown for each level of clinical significance (__ClinVar__ and __non-ClinVar (Other)__ variants combined):
 	      * Pathogenic
 	      * Likely Pathogenic
 	      * Variants of Uncertain Significance (VUS)
@@ -31,7 +31,7 @@ The report is structured in multiple sections, described briefly below:
 
   4. __Genomic biomarkers__
      * Reported clinical evidence items from [CIViC](https://civicdb.org) that match with variants in the query set are reported in four distinct tabs (Predictive / Prognostic / Diagnostic / Predisposing)
-        - See section below for [details of biomarker annotations]()
+        - See section below for [details of biomarker annotations](#biomarker-annotations)
 
   5. __Secondary findings__
      * Pathogenic variants in the [ACMG recommended list of genes for report of secondary/incidental findings](https://www.ncbi.nlm.nih.gov/clinvar/docs/acmg/)
@@ -50,6 +50,8 @@ The report is structured in multiple sections, described briefly below:
   8. __References__
 	    * Supporting scientific literature - knowledge resources, guideline references etc.)
 
+<br><br>
+
 ### Variant call format - VCF
 
 A VCF file containing annotated, germline calls (single nucleotide variants and insertions/deletions) is generated with the following naming convention:
@@ -143,7 +145,7 @@ A VCF file containing annotated, germline calls (single nucleotide variants and
 | `REFSEQ_PROTEIN_ID` | RefSeq protein/peptide identifier for VEP's picked transcript (*NP_XXXXXX*) |
 | `TRANSCRIPT_MANE_SELECT` | MANE select transcript identifer: one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene |
 | `TRANSCRIPT_MANE_PLUS_CLINICAL` | transcripts chosen to supplement MANE Select when needed for clinical variant reporting |
-| `GENCODE_TAG` | tag for gencode transcript (basic etc) |
+| `GENCODE_TAG` | tag for GENCODE transcript (basic etc) |
 | `GENCODE_TRANSCRIPT_TYPE` | type of transcript (protein-coding etc.) |
 | `TSG` | Indicates whether gene is predicted as a tumor suppressor gene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource |
 | `TSG_SUPPORT` | Underlying evidence for gene being a tumor suppressor. Format: `CGC_TIER<1/2>&NCG&CancerMine:num_citations"` |
@@ -154,7 +156,7 @@ A VCF file containing annotated, germline calls (single nucleotide variants and
 | `CGC_SOMATIC` | Member of Cancer Gene Census - somatic set |
 | `CGC_TIER` | Cancer Gene Census tier (1/2) |
 | `NCG_DRIVER` | Cancer driver gene prediction by Network of Cancer Genes (NCG) |
-| `INTOGEN_DRIVER` | Indicates whether gene is predicted as cancer driver from IntoGen's cancer driver prediction algorithm |
+| `INTOGEN_DRIVER` | Indicates whether gene is predicted as cancer driver from IntOGen's cancer driver prediction algorithm |
 | `PROB_EXAC_LOF_INTOLERANT` | `dbNSFP_gene`: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 data |
 | `PROB_EXAC_LOF_INTOLERANT_HOM` | `dbNSFP_gene`: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 data |
 | `PROB_EXAC_LOF_TOLERANT_NULL` | `dbNSFP_gene`: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 data |
@@ -276,8 +278,13 @@ A VCF file containing annotated, germline calls (single nucleotide variants and
 
 ##### _Variant/genotype information_
 
-| 1. `GENOTYPE` | Variant genotype (*het*/*hom_ref*/*hom_alt*) |
-| 2. `DP_CONTROL` | Sequencing depth at variant site ('DP')|
+| Tag | Description |
+|-----|-------------|
+| `GENOTYPE` | Variant genotype (*het*/*hom_ref*/*hom_alt*) |
+| `DP_CONTROL` | Sequencing depth at variant site ('DP')|
+
+
+<br><br>
 
 ### Excel workbook - XLSX
 
@@ -292,6 +299,8 @@ The four sheets of the workbook contains the following:
  - *BIOMARKER_EVIDENCE* - matches of variants with genomic biomarkers
  - *SECONDARY_FINDINGS* - potential secondary findings
 
+<br><br>
+
 ### Tab-separated values - TSV
 
 We provide a compressed tab-separated values file with variant classifications and the most essential variant/gene annotations. The file has the following naming convention:
@@ -359,7 +368,7 @@ The following variables are included in the tiered TSV file (VCF tags in the que
 | 53. `N_INSILICO_SPLICING_NEUTRAL` | Number of algorithms with splicing neutral prediction from dbscSNV |
 | 54. `N_INSILICO_SPLICING_AFFECTED` | Number of algorithms with splicing affected prediction from dbscSNV |
 | 55. `gnomADe_AF` | Global MAF in gnomAD (exome samples) |
-| 56. `FINAL_CLASSIFICATION` | Final variant classification based on the combination of `CLINVAR_CLASSIFICTION` (for ClinVar-classified variants), and `CPSR_CLASSIFICATION` (for novel variants) |
+| 56. `FINAL_CLASSIFICATION` | Final variant classification, using either `CLINVAR_CLASSIFICATION` if variant is ClinVar-classified, or `CPSR_CLASSIFICATION` for novel variants |
 | 57. `CPSR_CLASSIFICATION` | Variant clinical significance by CPSR's classification algorithm (P/LP/VUS/LB/B) |
 | 58. `CPSR_PATHOGENICITY_SCORE` | Aggregated pathogenicity score by CPSR's algorithm |
 | 59. `CPSR_CLASSIFICATION_CODE` | Combination of CPSR classification codes assigned to the variant (ACMG) |
@@ -368,6 +377,8 @@ The following variables are included in the tiered TSV file (VCF tags in the que
 
 **NOTE**: The user has the possibility to append the TSV file with data from other INFO tags in the input VCF (i.e. using the *--retained_info_tags* option)
 
+<br><br>
+
 ### Biomarker annotations
 
 The interactive HTML report (section *Genomic biomarkers*) and the Excel workbook (sheet *BIOMARKER_EVIDENCE* contains information on matches between potential pathogenic/likely pathogenic sample variants and reported biomarkers, the latter referring to clinical evidence items that relate genomic genomic aberrations to prognosis, diagnosis or sensitivity/resistance to particular treatments. All biomarker annotations are prefixed with **BM_**, and the following is provided per evidence item:
@@ -378,7 +389,7 @@ The interactive HTML report (section *Genomic biomarkers*) and the Excel workboo
 | 2. `BM_DISEASE_ONTOLOGY_ID` | Disease ontology id for cancer type - from CIViC |
 | 3. `BM_PRIMARY_SITE` | Primary tumor type of cancer type - mapped with [phenOncoX](https://github.com/sigven/phenOncoX) |
 | 4. `BM_CLINICAL_SIGNIFICANCE` | Clinical significance of biomarker (drug sensitivity, drug resistance, poor outcome etc.) - from CIViC |
-| 5. `BM_THERAPEUTIC_CONTEXT` | Cancer drugs associated with biomarker (for biomarkers related to drug sensitivity/reistance) - from CIViC |
+| 5. `BM_THERAPEUTIC_CONTEXT` | Cancer drugs associated with biomarker (for biomarkers related to drug sensitivity/resistance) - from CIViC |
 | 6. `BM_CITATION` | Reference/source for biomarker - i.e. publication or guidelines - from CIViC |
 | 7. `BM_RATING` | Rating of biomarker - from CIViC |
 | 8. `BM_MOLECULAR_PROFILE_NAME` | Associated name of molecular profile - i.e. "BRCA mutation" - from CIViC |

diff --git a/vignettes/running.Rmd b/vignettes/running.Rmd
@@ -209,7 +209,7 @@ $ (base) conda activate pcgr
 $ (pcgr)
 cpsr \
 	 --input_vcf ~/cpsr-1.2/example.vcf.gz \
-	 --vep_dir ~/.vep
+	 --vep_dir ~/.vep \
 	 --refdata_dir ~/pcgr_ref_data \
 	 --output_dir ~/cpsr-1.2 \
 	 --genome_assembly grch37 \

diff --git a/vignettes/virtual_panels.Rmd b/vignettes/virtual_panels.Rmd
@@ -9,12 +9,12 @@ The cancer predisposition report can show variants found in a number of well-kno
 
   * **Panel 0** is a non-conservative, research-based _superpanel_ assembled through multiple sources on cancer predisposition genes:
 	* A list of 152 genes that were curated and established within TCGA’s pan-cancer study ([Huang et al., *Cell*, 2018](https://www.ncbi.nlm.nih.gov/pubmed/29625052))
-	* A list of 114 protein-coding genes that has been manually curated in COSMIC’s [Cancer Gene Census v99](https://cancer.sanger.ac.uk/census),
+	* A list of 113 protein-coding genes that has been manually curated in COSMIC’s [Cancer Gene Census v100](https://cancer.sanger.ac.uk/census),
 	* Genes from all [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk/) panels for inherited cancers and tumor syndromes, as well as DNA repair genes (detailed below)
 	* Additional genes deemed relevant for cancer predisposition (i.e. contributed by CPSR users)
 
 
-	The combination of the above sources resulted in a non-redundant set of **n = 563**
+	The combination of the above sources resulted in a non-redundant set of **n = 562**
 	genes of relevance for cancer predisposition (see complete details [below](#panel-0))
 
 	Data with respect to mechanisms of inheritance (<i>MoI</i> - autosomal recessive (AR) vs. autosomal
@@ -73,7 +73,7 @@ The cancer predisposition report can show variants found in a number of well-kno
 
 ## Panel 0
 
-[Download the complete set of CPSR superpanel genes, grch37/grch38 versions (xlsx)](https://sigven.github.io/cpsr/cpsr_superpanel_2024_03.xlsx)
+[Download the complete set of CPSR superpanel genes, grch37/grch38 versions (xlsx)](https://sigven.github.io/cpsr/cpsr_superpanel_2024_05.xlsx)
 
 
 <!--