-
Notifications
You must be signed in to change notification settings - Fork 1
data_sets
This dataset contains example fusion output from the chimeraviz BioConductor package.
Root directory: /home/projects/hackseq17_3/datasets/chimeraviz_examples/
Relevant files:
-
FusionMap_01_TestDataset_InputFastq.FusionReport.txt
- FusionMap output -
PRADA.acc.fusion.fq.TAF.tsv
- PRADA output -
defuse_833ke_results.filtered.tsv
- DeFuse output -
ericscript_SRR1657556.results.total.tsv
- EricScript output -
fusioncatcher_833ke_final-list-candidate-fusion-genes.txt
- FusionCatcher output -
infusion_fusions.txt
- Infusion output -
jaffa_results.csv
- JAFFA results -
soapfuse_833ke_final.Fusion.specific.for.genes
- SOAPFuse results -
star-fusion.fusion_candidates.final.abridged.txt
- STARfusion results
This data set contains a small FASTQ subset of files, containing reads supporting the fusions described on the fusioncatcher github page.
Root directory: /home/projects/hackseq17_3/datasets/fusioncatcher_examples/
Relevant files:
-
final-list_candidate-fusion-genes.txt
- fusioncatcher output -
readme.txt
- README describing the detected fusions -
reads_1.fq.gz
- Read 1 file -
reads_2.fq.gz
- Read 2 file
This data set contains fusion results for three technical replicates from an AML cell line.
Root directory: /home/projects/hackseq17_3/datasets/aml_cell_line_examples/
Under the root directory, you'll find directories named by tool and library. There are results for fusioncatcher, defuse, ericscript, STAR-fusion (with Oncofuse annotations),and PAVfinder.
- Located online at PanCanFusV2
- Downloaded to ORCA at
/home/projects/hackseq17_3/annotation_sources/tumour_fusion_gene_data_portal/
- Contains 17,754 observations of 27 variables, in a format that is amenable to conversion to BEDPE
- Annotations include recurrence in TCGA tumour types, as well as additional manual and automated curations
- Located online at DGV
- Latest release of GRCh37 dataset is on ORCA at
/home/projects/hackseq17_3/annotation_sources/dgv/
- Contains 392,583 observations of 20 variables. These are mainly CNVs, insertions, and deletions though, so it seems it won't be as relevant here
- Issue #14
- Fusioncatcher includes a whole lot of annotation resources
- These are described in the Fusioncatcher manual
- These mainly consist of just lists of Ensembl gene IDs
- Downloaded on ORCA at
/home/projects/hackseq17_3/tools/fusioncatcher_install/fusioncatcher/data/human_v89/
- Issue #21
- Atlas of Genetics and Cytogenetics in Oncology and Haematology
- Seems like there is no API or download access
- This database contains a lot of well-curated information - it may only be possible to query through the web interface though
- Depending on the eventual review interface, it may be possible to have links to this resource - it doesn't look like it's possible to include in an automated way though.
- CIViC contains clinical interpretations of variants in cancer
- There is both API and bulk download access
- September 2017 release has been downloaded on ORCA at
/home/projects/hackseq17_3/annotation_sources/civic/
- The contents of the individual files are described in #13
- It looks like there ~85 annotated fusions
- ChimerDB
- Contains three kinds of data:
- "ChimerKB represents a knowledgebase including 1,066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences."
- "ChimerPub includes 2,767 fusion genes obtained from text mining of PubMed abstracts."
- "ChimerSeq module is designed to archive the fusion candidates from deep sequencing data."
- These data files are available as MySQL and Excel-formatted dump files
- These aren't downloaded to ORCA yet
- TICdb
- 1,374 annotated fusions, with annotations of the gene partners and actual fusion sequence, with links to Pubmed or Genbank
- Not downloaded to ORCA yet
- ChiTaRS
- 20,754 annotations for humans, downloaded from
http://chitars.bioinfo.cnio.es/downloads.html
- Downloaded to ORCA at:
/home/projects/hackseq17_3/annotation_sources/chitars/all_human_ChiTaRS_coord.csv
- Note that it looks like this data source hasn't been updated since late-2014
- Catalogue of Somatic Mutations in Cancer
- Cell Lines Project
- There is an API for querying COSMIC, and bulk downloads are available at downloads
- The downloads are via SFTP -
@rdocking
has credentials but hasn't downloaded things to ORCA yet - See also #29