docs update

sigven · May 30, 2024 · 31dbff9 · 31dbff9
1 parent 83aaae3
commit 31dbff9
Show file tree

Hide file tree

Showing 7 changed files with 78 additions and 49 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Cancer Predisposition Sequencing Reporter <a href="https://sigven.github.io/cpsr/"><img src="man/figures/logo.png" align="right" height="118" width="100"/></a>
 
-The *Cancer Predisposition Sequencing Reporter (CPSR)* is a computational workflow that **interprets and classifies germline DNA variants** identified from next-generation sequencing **in the context of cancer predisposition and inherited cancer syndromes**. The workflow can also report **incidental findings (ACMG v3.0)** as well as the genotypes of common germline variants associated with cancer risk, as reported in the **NHGRI-EBI GWAS catalog**.
+The *Cancer Predisposition Sequencing Reporter (CPSR)* is a computational workflow that **interprets and classifies the clinical significance of germline DNA variants** identified from next-generation sequencing **in the context of cancer predisposition and inherited cancer syndromes**. The workflow can also report **incidental findings (ACMG v3.2)**.
 
 The CPSR workflow is integrated with the framework that underlies the [Personal Cancer Genome Reporter - PCGR](https://github.com/sigven/pcgr). While *PCGR* is intended for reporting and analysis of somatic variants detected in a tumor, *CPSR* is intended for reporting and ranking of germline variants in protein-coding genes that are implicated in cancer predisposition and inherited cancer syndromes. 
 
@@ -12,11 +12,21 @@ Snapshots of sections in the cancer predisposition genome report:
 
 ## News
 
+-   *May 2024*: **2.x.x release**
+    -   New HTML report generation and layout with [quarto](https://quarto.org/)
+    -   Excel output supported
+    -   Updated virtual gene panels (Genomics England PanelApp, Cancer Gene Census)
+    -   Reference data updates, most importantly including 
+        - ClinVar - May 2024
+        - CIViC - May 2024
+        - GENCODE - v45
+    -   Software updates - VEP 111
+    -   Extensive code clean-up and re-structuring
+
 -   *November 2022*: **1.0.1 release**
     -   Added CPSR logo (designed by [Hal Nakken](https://halvetica.net))
 
 -   *February 2022*: **1.0.0 release**
-
     -   Complete restructure of code and Conda installation routines, contributed largely by the great [@pdiakumis](https://github.com/pdiakumis)
     -   Updated data bundle
         - ClinVar - Feb 2022
@@ -27,16 +37,6 @@ Snapshots of sections in the cancer predisposition genome report:
     -   Software upgrade (VEP 105, R/BioConductor)
     -   New documentation site ([https://sigven.github.io/cpsr](https://sigven.github.io/cpsr))
 
--   *June 30th 2021*: **0.6.2 release**
-
-    -   Updated bundle (ClinVar, CancerMine, UniprotKB, PanelApp, CIViC, GWAS catalog)
-    -   Software upgrade (VEP, R/BioConductor)
-    -   [CHANGELOG](http://cpsr.readthedocs.io/en/latest/CHANGELOG.html)
-
--   *November 30th 2020*: **0.6.1 release**
-
-    -   Updated bundle (ClinVar, CancerMine, UniprotKB, CIViC, GWAS catalog)
-    -   [CHANGELOG](http://cpsr.readthedocs.io/en/latest/CHANGELOG.html)
 
 ## Example report
 

diff --git a/pkgdown/_pkgdown.yml b/pkgdown/_pkgdown.yml
@@ -25,8 +25,8 @@ navbar:
   type: light
   bg: info
   structure:
-    left: [installation, running, reference, articles]
-    right: [search, changelog, github]
+    left: [installation, running, articles, changelog]
+    right: [search, reference, github]
   components:
     home:
       text: Intro

diff --git a/pkgdown/index.md b/pkgdown/index.md
@@ -20,11 +20,20 @@ Snapshots of sections in the cancer predisposition genome report:
 
 ![](img/cpsr_views.png)
 
+<br>
 
 ### News
 
-* *March 2024*: **1.xxx release**
-  * Major bundle update
+* *May 2024*: **2.x.x release**
+  - New HTML report generation and layout with [quarto](https://quarto.org/)
+  - Excel output supported
+  - Updated virtual gene panels (Genomics England PanelApp, Cancer Gene Census)
+  - Reference data updates, most importantly including 
+    - ClinVar - May 2024
+    - CIViC - May 2024
+    - GENCODE - v45
+  - Software updates - VEP 111
+  - Extensive code clean-up and re-structuring
 
 * *November 2022*: **1.0.1 release**
   * Added CPSR logo (designed by [Hal Nakken](https://halvetica.net))

diff --git a/vignettes/CHANGELOG.Rmd b/vignettes/CHANGELOG.Rmd
@@ -3,6 +3,33 @@ title: "Changelog"
 output: rmarkdown::html_document
 ---
 
+## v2.0.0
+
+* Date: **2024-05-xx**
+* Data updates
+  * ClinVar
+  * GWAS catalog
+  * CIViC
+  * GENCODE
+  * Cancer Gene Census
+  * PanelApp
+  * Disease Ontology/EFO
+  * UniProt KB
+
+##### Added/changed
+
+- New report generation framework - [quarto](https://quarto.org)
+  - multiple options related to Rmarkdown output are now deprecated
+- Re-organized data bundle structure
+  - Users need to download an assembly-specific VEP cache separately from PCGR/CPSR
+- Re-engineered data bundle generation pipeline
+- Improved data bundle documentation
+  - An HTML report with an overview of the contents of the data bundle is shipped with the reference data itself.
+- Cleaned up code base for reporting and classification
+- Software now also offers a multi-sheet Excel workbook output with variant classifications and biomarker findings, amenable e.g. for aggregation of results across samples
+
+#### 
+
 ## v1.0.1
 
 * Date: **2022-11-11**

diff --git a/vignettes/annotation_resources.Rmd b/vignettes/annotation_resources.Rmd
@@ -15,19 +15,19 @@ output: rmarkdown::html_document
   * [Cancer Hotspots](http://cancerhotspots.org) - a resource for statistically significant mutations in cancer (v2, 2017)
 
 ### Variant databases of clinical utility
-  * [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - database of clinically related variants (March 2024)
-  * [CIViC](https://civicdb.org) - clinical interpretations of variants in cancer (February 29th 2024)
+  * [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - database of clinically related variants (May 2024)
+  * [CIViC](https://civicdb.org) - clinical interpretations of variants in cancer (May 23rd 2024)
 
 ### Protein domains/functional features
-  * [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - resource on protein sequence and functional information (2024_01)
+  * [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - resource on protein sequence and functional information (2024_02)
   * [Pfam](http://pfam.xfam.org) - database of protein families and domains (v35.0, November 2021)
 
 ### Cancer gene knowledge bases
   * [CancerMine](http://bionlp.bcgsc.ca/cancermine/) - Literature-mined database of tumor suppressor genes/proto-oncogenes (v50, March 2023)
-  * [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk) - cancer phenotype panels as of February 2nd 2024
-  * [Cancer Gene Census](https://www.sanger.ac.uk/data/cancer-gene-census/) - genes implicated with cancer susceptibility (v99)
+  * [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk) - cancer phenotype panels as of May 2024
+  * [Cancer Gene Census](https://www.sanger.ac.uk/data/cancer-gene-census/) - genes implicated with cancer susceptibility (v100)
 
 ### Phenotype ontologies
-  * [UMLS/MedGen](https://www.ncbi.nlm.nih.gov/medgen/) - February 2024
-  * [Disease Ontology](https://disease-ontology.org/) - December 2023
-  * [Experimental Factor Ontology](https://github.com/EBISPOT/efo) - v3.62.0
+  * [UMLS/MedGen](https://www.ncbi.nlm.nih.gov/medgen/) - May 2024
+  * [Disease Ontology](https://disease-ontology.org/) - April 2024
+  * [Experimental Factor Ontology](https://github.com/EBISPOT/efo) - v3.66.0
diff --git a/vignettes/output.Rmd b/vignettes/output.Rmd
@@ -292,7 +292,7 @@ We provide an Excel workbook with **four** sheets that lists main findings and a
 
 - `<sample_id>.cpsr.<genome_assembly>.xlsx`
 
-The four sheets of the workbook contains the following:
+The Excel workbook is populated with the following sheets (pending that data is available):
 
  - *VIRTUAL_PANEL* - details on the the chosen virtual gene panel 
  - *CLASSIFICATION* - variant classifications and corresponding gene annotations

diff --git a/vignettes/running.Rmd b/vignettes/running.Rmd
@@ -83,7 +83,9 @@ usage:
 	Required arguments:
 	--input_vcf INPUT_VCF
 				    VCF input file with germline query variants (SNVs/InDels).
-	--refdata_dir REFDATA_DIR   Directory that contains the uncompressed PCGR/CPSR reference data bundle 
+	--vep_dir VEP_DIR     Directory of VEP cache, e.g.  $HOME/.vep
+  --refdata_dir REFDATA_DIR
+                        Directory that contains the PCGR/CPSR reference data, e.g. ~/pcgr-data-1.4.1.9003 
 	--output_dir OUTPUT_DIR
 				    Output directory
 	--genome_assembly {grch37,grch38}
@@ -166,7 +168,7 @@ usage:
 	--vep_buffer_size VEP_BUFFER_SIZE
 				    Variant buffer size (variants read into memory simultaneously, option '--buffer_size' in VEP)
 				    - set lower to reduce memory usage, default: 500
-	--vep_gencode_basic   Consider only basic GENCODE transcripts with Variant Effect Predictor (VEP) (option '--gencode_basic' in VEP is used by default).
+	--vep_gencode_basic   Consider only basic GENCODE transcripts with Variant Effect Predictor (VEP) (option '--gencode_basic' in VEP).
 	--vep_pick_order VEP_PICK_ORDER
 				    Comma-separated string of ordered transcript properties for primary variant pick
 					( option '--pick_order' in VEP), default: mane_select,mane_plus_clinical,canonical,appris,tsl,biotype,ccds,rank,length
@@ -180,38 +182,29 @@ usage:
 	--force_overwrite     By default, the script will fail with an error if any output file already exists.
 					You can force the overwrite of existing result files by using this flag, default: False
 	--version             show program's version number and exit
-	--no_reporting        Run functional variant annotation on VCF through VEP/vcfanno, omit tier assignment/report generation (STEP 4), default: False
+	--no_reporting        Run functional variant annotation on VCF through VEP/vcfanno, omit tier assignment/report generation, default: False
 	--retained_info_tags RETAINED_INFO_TAGS
 				    Comma-separated string of VCF INFO tags from query VCF that should be kept in CPSR output TSV
-	--docker_uid DOCKER_USER_ID
-				    Docker user ID. Default is the host system user ID. If you are experiencing permission errors,
-					try setting this up to root (`--docker_uid root`), default: None
-	--no_docker           Run the CPSR workflow in a non-Docker mode, default: False
-
-	--report_theme {default,cerulean,journal,flatly,readable,spacelab,united,cosmo,lumen,paper,sandstone,simplex,yeti}
-				    Visual report theme (rmarkdown), default: default
-	--report_nonfloating_toc
-				    Do not float the table of contents (TOC) in output HTML report, default: False
-	--report_table_display {full,light}
-				    Set the level of detail/comprehensiveness in interactive datables of HTML report, very comprehensive (option 'full') or slim/focused ('light')
 	--ignore_noncoding    Do not list non-coding variants in HTML report, default: False
 	--debug            Print full docker commands to log, default: False
+	--pcgrr_conda PCGRR_CONDA
+                        pcgrr conda env name (default: pcgrr)
 ```
 
 ## Example run
 
-The *cpsr* software bundle contains an example VCF file.
+The *cpsr* R package comes with a test VCF file (GRCh37, not originating as germline calls from a real patient) that can be used to test the CPSR pipeline.
 
 Report generation with the example VCF, using the [Adult solid tumours cancer susceptibility](https://panelapp.genomicsengland.co.uk/panels/245/) as the virtual gene panel, can be performed through the following command:
 
 ```bash
 $ (base) conda activate pcgr
 $ (pcgr)
 cpsr \
-	 --input_vcf ~/cpsr-1.2/example.vcf.gz \
+	 --input_vcf ~/cpsr-2.0.0/inst/examples/example.vcf.gz \
 	 --vep_dir ~/.vep \
 	 --refdata_dir ~/pcgr_ref_data \
-	 --output_dir ~/cpsr-1.2 \
+	 --output_dir ~/cpsr-2.0.0/ \
 	 --genome_assembly grch37 \
 	 --panel_id 1 \
 	 --sample_id example \
@@ -221,18 +214,18 @@ cpsr \
 	 --force_overwrite
 ```
 
-Note that the example command also refers to the PCGR data bundle directory (*pcgr_db*), which contains the data bundle that are necessary for both *PCGR* and *CPSR*.
+Note that the example command also refers to the PCGR data bundle directory (*refdata_dir*), which contains the data bundle that are necessary for both *PCGR* and *CPSR*.
 
 This command will produce the following output files in the _output_ folder:
 
   1. __example.cpsr.grch37.vcf.gz (.tbi)__ - Bgzipped VCF file with relevant annotations appended by CPSR
   2. __example.cpsr.grch37.pass.vcf.gz (.tbi)__ - Bgzipped VCF file with relevant annotations appended by CPSR (PASS variants only)
-  3. __example.cpsr.grch37.yaml__ - CPSR configuration file - output from pre-reporting (Python) workflow
+  3. __example.cpsr.grch37.conf.yaml__ - CPSR configuration file - output from pre-reporting annotation (Python) workflow
   4. __example.cpsr.grch37.pass.tsv.gz__ - Compressed TSV file (generated with [vcf2tsvpy](https://github.com/sigven/vcf2tsvpy)) of VCF content with relevant annotations appended by CPSR
-  5. __example.cpsr.grch37.xlsx__ - A four-sheet Excel workbook that contains
+  5. __example.cpsr.grch37.xlsx__ - An Excel workbook that contains
       * _i)_ information on virtual gene panel interrogated for variants
-      * _ii)_ classification of variants found in input VCF
-      * _iii)_ match of variants with existing biomarkers
-      * _iv)_ secondary findings
+      * _ii)_ classification of clinical significance for variants overlapping with cancer predisposition genes
+      * _iii)_ match of variants with existing biomarkers (if any found)
+      * _iv)_ secondary findings (if any found)
   6. __example.cpsr.grch37.html__ - Interactive HTML report with clinically relevant variants in cancer predisposition genes
-  7. __example.cpsr.grch37.snvs_indels.classification.tsv.gz__ - TSV file with key annotations of SNVs/InDels classified according to clinical signififance
+  7. __example.cpsr.grch37.snvs_indels.classification.tsv.gz__ - TSV file with key annotations of SNVs/InDels classified according to clinical significance