Skip to content

Commit

Permalink
docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
sigven committed May 30, 2024
1 parent 83aaae3 commit 31dbff9
Show file tree
Hide file tree
Showing 7 changed files with 78 additions and 49 deletions.
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Cancer Predisposition Sequencing Reporter <a href="https://sigven.github.io/cpsr/"><img src="man/figures/logo.png" align="right" height="118" width="100"/></a>

The *Cancer Predisposition Sequencing Reporter (CPSR)* is a computational workflow that **interprets and classifies germline DNA variants** identified from next-generation sequencing **in the context of cancer predisposition and inherited cancer syndromes**. The workflow can also report **incidental findings (ACMG v3.0)** as well as the genotypes of common germline variants associated with cancer risk, as reported in the **NHGRI-EBI GWAS catalog**.
The *Cancer Predisposition Sequencing Reporter (CPSR)* is a computational workflow that **interprets and classifies the clinical significance of germline DNA variants** identified from next-generation sequencing **in the context of cancer predisposition and inherited cancer syndromes**. The workflow can also report **incidental findings (ACMG v3.2)**.

The CPSR workflow is integrated with the framework that underlies the [Personal Cancer Genome Reporter - PCGR](https://github.com/sigven/pcgr). While *PCGR* is intended for reporting and analysis of somatic variants detected in a tumor, *CPSR* is intended for reporting and ranking of germline variants in protein-coding genes that are implicated in cancer predisposition and inherited cancer syndromes.

Expand All @@ -12,11 +12,21 @@ Snapshots of sections in the cancer predisposition genome report:

## News

- *May 2024*: **2.x.x release**
- New HTML report generation and layout with [quarto](https://quarto.org/)
- Excel output supported
- Updated virtual gene panels (Genomics England PanelApp, Cancer Gene Census)
- Reference data updates, most importantly including
- ClinVar - May 2024
- CIViC - May 2024
- GENCODE - v45
- Software updates - VEP 111
- Extensive code clean-up and re-structuring

- *November 2022*: **1.0.1 release**
- Added CPSR logo (designed by [Hal Nakken](https://halvetica.net))

- *February 2022*: **1.0.0 release**

- Complete restructure of code and Conda installation routines, contributed largely by the great [@pdiakumis](https://github.com/pdiakumis)
- Updated data bundle
- ClinVar - Feb 2022
Expand All @@ -27,16 +37,6 @@ Snapshots of sections in the cancer predisposition genome report:
- Software upgrade (VEP 105, R/BioConductor)
- New documentation site ([https://sigven.github.io/cpsr](https://sigven.github.io/cpsr))

- *June 30th 2021*: **0.6.2 release**

- Updated bundle (ClinVar, CancerMine, UniprotKB, PanelApp, CIViC, GWAS catalog)
- Software upgrade (VEP, R/BioConductor)
- [CHANGELOG](http://cpsr.readthedocs.io/en/latest/CHANGELOG.html)

- *November 30th 2020*: **0.6.1 release**

- Updated bundle (ClinVar, CancerMine, UniprotKB, CIViC, GWAS catalog)
- [CHANGELOG](http://cpsr.readthedocs.io/en/latest/CHANGELOG.html)

## Example report

Expand Down
4 changes: 2 additions & 2 deletions pkgdown/_pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ navbar:
type: light
bg: info
structure:
left: [installation, running, reference, articles]
right: [search, changelog, github]
left: [installation, running, articles, changelog]
right: [search, reference, github]
components:
home:
text: Intro
Expand Down
13 changes: 11 additions & 2 deletions pkgdown/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,20 @@ Snapshots of sections in the cancer predisposition genome report:

![](img/cpsr_views.png)

<br>

### News

* *March 2024*: **1.xxx release**
* Major bundle update
* *May 2024*: **2.x.x release**
- New HTML report generation and layout with [quarto](https://quarto.org/)
- Excel output supported
- Updated virtual gene panels (Genomics England PanelApp, Cancer Gene Census)
- Reference data updates, most importantly including
- ClinVar - May 2024
- CIViC - May 2024
- GENCODE - v45
- Software updates - VEP 111
- Extensive code clean-up and re-structuring

* *November 2022*: **1.0.1 release**
* Added CPSR logo (designed by [Hal Nakken](https://halvetica.net))
Expand Down
27 changes: 27 additions & 0 deletions vignettes/CHANGELOG.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,33 @@ title: "Changelog"
output: rmarkdown::html_document
---

## v2.0.0

* Date: **2024-05-xx**
* Data updates
* ClinVar
* GWAS catalog
* CIViC
* GENCODE
* Cancer Gene Census
* PanelApp
* Disease Ontology/EFO
* UniProt KB

##### Added/changed

- New report generation framework - [quarto](https://quarto.org)
- multiple options related to Rmarkdown output are now deprecated
- Re-organized data bundle structure
- Users need to download an assembly-specific VEP cache separately from PCGR/CPSR
- Re-engineered data bundle generation pipeline
- Improved data bundle documentation
- An HTML report with an overview of the contents of the data bundle is shipped with the reference data itself.
- Cleaned up code base for reporting and classification
- Software now also offers a multi-sheet Excel workbook output with variant classifications and biomarker findings, amenable e.g. for aggregation of results across samples

####

## v1.0.1

* Date: **2022-11-11**
Expand Down
16 changes: 8 additions & 8 deletions vignettes/annotation_resources.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,19 @@ output: rmarkdown::html_document
* [Cancer Hotspots](http://cancerhotspots.org) - a resource for statistically significant mutations in cancer (v2, 2017)

### Variant databases of clinical utility
* [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - database of clinically related variants (March 2024)
* [CIViC](https://civicdb.org) - clinical interpretations of variants in cancer (February 29th 2024)
* [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - database of clinically related variants (May 2024)
* [CIViC](https://civicdb.org) - clinical interpretations of variants in cancer (May 23rd 2024)

### Protein domains/functional features
* [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - resource on protein sequence and functional information (2024_01)
* [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - resource on protein sequence and functional information (2024_02)
* [Pfam](http://pfam.xfam.org) - database of protein families and domains (v35.0, November 2021)

### Cancer gene knowledge bases
* [CancerMine](http://bionlp.bcgsc.ca/cancermine/) - Literature-mined database of tumor suppressor genes/proto-oncogenes (v50, March 2023)
* [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk) - cancer phenotype panels as of February 2nd 2024
* [Cancer Gene Census](https://www.sanger.ac.uk/data/cancer-gene-census/) - genes implicated with cancer susceptibility (v99)
* [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk) - cancer phenotype panels as of May 2024
* [Cancer Gene Census](https://www.sanger.ac.uk/data/cancer-gene-census/) - genes implicated with cancer susceptibility (v100)

### Phenotype ontologies
* [UMLS/MedGen](https://www.ncbi.nlm.nih.gov/medgen/) - February 2024
* [Disease Ontology](https://disease-ontology.org/) - December 2023
* [Experimental Factor Ontology](https://github.com/EBISPOT/efo) - v3.62.0
* [UMLS/MedGen](https://www.ncbi.nlm.nih.gov/medgen/) - May 2024
* [Disease Ontology](https://disease-ontology.org/) - April 2024
* [Experimental Factor Ontology](https://github.com/EBISPOT/efo) - v3.66.0
2 changes: 1 addition & 1 deletion vignettes/output.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ We provide an Excel workbook with **four** sheets that lists main findings and a

- `<sample_id>.cpsr.<genome_assembly>.xlsx`

The four sheets of the workbook contains the following:
The Excel workbook is populated with the following sheets (pending that data is available):

- *VIRTUAL_PANEL* - details on the the chosen virtual gene panel
- *CLASSIFICATION* - variant classifications and corresponding gene annotations
Expand Down
41 changes: 17 additions & 24 deletions vignettes/running.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@ usage:
Required arguments:
--input_vcf INPUT_VCF
VCF input file with germline query variants (SNVs/InDels).
--refdata_dir REFDATA_DIR Directory that contains the uncompressed PCGR/CPSR reference data bundle
--vep_dir VEP_DIR Directory of VEP cache, e.g. $HOME/.vep
--refdata_dir REFDATA_DIR
Directory that contains the PCGR/CPSR reference data, e.g. ~/pcgr-data-1.4.1.9003
--output_dir OUTPUT_DIR
Output directory
--genome_assembly {grch37,grch38}
Expand Down Expand Up @@ -166,7 +168,7 @@ usage:
--vep_buffer_size VEP_BUFFER_SIZE
Variant buffer size (variants read into memory simultaneously, option '--buffer_size' in VEP)
- set lower to reduce memory usage, default: 500
--vep_gencode_basic Consider only basic GENCODE transcripts with Variant Effect Predictor (VEP) (option '--gencode_basic' in VEP is used by default).
--vep_gencode_basic Consider only basic GENCODE transcripts with Variant Effect Predictor (VEP) (option '--gencode_basic' in VEP).
--vep_pick_order VEP_PICK_ORDER
Comma-separated string of ordered transcript properties for primary variant pick
( option '--pick_order' in VEP), default: mane_select,mane_plus_clinical,canonical,appris,tsl,biotype,ccds,rank,length
Expand All @@ -180,38 +182,29 @@ usage:
--force_overwrite By default, the script will fail with an error if any output file already exists.
You can force the overwrite of existing result files by using this flag, default: False
--version show program's version number and exit
--no_reporting Run functional variant annotation on VCF through VEP/vcfanno, omit tier assignment/report generation (STEP 4), default: False
--no_reporting Run functional variant annotation on VCF through VEP/vcfanno, omit tier assignment/report generation, default: False
--retained_info_tags RETAINED_INFO_TAGS
Comma-separated string of VCF INFO tags from query VCF that should be kept in CPSR output TSV
--docker_uid DOCKER_USER_ID
Docker user ID. Default is the host system user ID. If you are experiencing permission errors,
try setting this up to root (`--docker_uid root`), default: None
--no_docker Run the CPSR workflow in a non-Docker mode, default: False
--report_theme {default,cerulean,journal,flatly,readable,spacelab,united,cosmo,lumen,paper,sandstone,simplex,yeti}
Visual report theme (rmarkdown), default: default
--report_nonfloating_toc
Do not float the table of contents (TOC) in output HTML report, default: False
--report_table_display {full,light}
Set the level of detail/comprehensiveness in interactive datables of HTML report, very comprehensive (option 'full') or slim/focused ('light')
--ignore_noncoding Do not list non-coding variants in HTML report, default: False
--debug Print full docker commands to log, default: False
--pcgrr_conda PCGRR_CONDA
pcgrr conda env name (default: pcgrr)
```

## Example run

The *cpsr* software bundle contains an example VCF file.
The *cpsr* R package comes with a test VCF file (GRCh37, not originating as germline calls from a real patient) that can be used to test the CPSR pipeline.

Report generation with the example VCF, using the [Adult solid tumours cancer susceptibility](https://panelapp.genomicsengland.co.uk/panels/245/) as the virtual gene panel, can be performed through the following command:

```bash
$ (base) conda activate pcgr
$ (pcgr)
cpsr \
--input_vcf ~/cpsr-1.2/example.vcf.gz \
--input_vcf ~/cpsr-2.0.0/inst/examples/example.vcf.gz \
--vep_dir ~/.vep \
--refdata_dir ~/pcgr_ref_data \
--output_dir ~/cpsr-1.2 \
--output_dir ~/cpsr-2.0.0/ \
--genome_assembly grch37 \
--panel_id 1 \
--sample_id example \
Expand All @@ -221,18 +214,18 @@ cpsr \
--force_overwrite
```

Note that the example command also refers to the PCGR data bundle directory (*pcgr_db*), which contains the data bundle that are necessary for both *PCGR* and *CPSR*.
Note that the example command also refers to the PCGR data bundle directory (*refdata_dir*), which contains the data bundle that are necessary for both *PCGR* and *CPSR*.

This command will produce the following output files in the _output_ folder:

1. __example.cpsr.grch37.vcf.gz (.tbi)__ - Bgzipped VCF file with relevant annotations appended by CPSR
2. __example.cpsr.grch37.pass.vcf.gz (.tbi)__ - Bgzipped VCF file with relevant annotations appended by CPSR (PASS variants only)
3. __example.cpsr.grch37.yaml__ - CPSR configuration file - output from pre-reporting (Python) workflow
3. __example.cpsr.grch37.conf.yaml__ - CPSR configuration file - output from pre-reporting annotation (Python) workflow
4. __example.cpsr.grch37.pass.tsv.gz__ - Compressed TSV file (generated with [vcf2tsvpy](https://github.com/sigven/vcf2tsvpy)) of VCF content with relevant annotations appended by CPSR
5. __example.cpsr.grch37.xlsx__ - A four-sheet Excel workbook that contains
5. __example.cpsr.grch37.xlsx__ - An Excel workbook that contains
* _i)_ information on virtual gene panel interrogated for variants
* _ii)_ classification of variants found in input VCF
* _iii)_ match of variants with existing biomarkers
* _iv)_ secondary findings
* _ii)_ classification of clinical significance for variants overlapping with cancer predisposition genes
* _iii)_ match of variants with existing biomarkers (if any found)
* _iv)_ secondary findings (if any found)
6. __example.cpsr.grch37.html__ - Interactive HTML report with clinically relevant variants in cancer predisposition genes
7. __example.cpsr.grch37.snvs_indels.classification.tsv.gz__ - TSV file with key annotations of SNVs/InDels classified according to clinical signififance
7. __example.cpsr.grch37.snvs_indels.classification.tsv.gz__ - TSV file with key annotations of SNVs/InDels classified according to clinical significance

0 comments on commit 31dbff9

Please sign in to comment.