TIGA Workflow

Steps for updating the TIGA dataset from sources.

Dependencies

R 3.6+; readr, data.table, igraph, muStat, RMySQL (Webapp: shiny, DT, shinyBS, shinysky, plotly)
Python 3.7+; pandas, BioClients
Java 8+; Jena, IU_IDSL_JENA

Steps

Download latest files from the NHGRI-EBI GWAS Catalog. See FTP site for latest and all releases. Required files:
- gwas-catalog-studies_ontology-annotated.tsv
- gwas-catalog-associations_ontology-annotated.tsv
RUN commands in Go_TIGA_Workflow.sh, as described here.
Download from Experimental Factor Ontology (EFO): * efo.owl
Clean studies: * gwascat_gwas.R
Clean, separate OR_or_beta into oddsratio, beta columns: * gwascat_assn.R
Convert EFO OWL to TSV: * java -jar iu_idsl_jena-0.0.1-SNAPSHOT-jar-with-dependencies.jar
From EFO TSV create GraphML: * efo_graph.R
Clean traits: * gwascat_trait.R
MAPPED GENES: Separate mapped into up-/down-stream. * snp2gene_mapped.pl
Get iCite RCRs for studies via PMIDs: * python3 -m BioClients.icite.Client get_stats
Get Ensembl annotations for mapped genes via EnsemblIds: * python3 -m BioClients.ensembl.Client get_info
Get IDG TCRD gene annotations: * python3 -m BioClients.idg.tcrd.Client listTargets
Run commands in Go_gwascat_DbCreate.sh building MySql db. Writes file gwas_counts.tsv.
Pre-process and filter. Studies, genes and traits may be removed due to insufficient evidence, with reasons recorded. * tiga_gt_prepfilter.R
Provenance for gene-trait pairs (STUDY_ACCESSION, PUBMEDID). * tiga_gt_provenance.R
Generate variables, statistics, evidence features for gene-trait pairs. * tiga_gt_variables.R
Score and rank gene-trait pairs based on selected variables. * tiga_gt_stats.R
TIGA web app requires files:
1. gwascat_gwas.tsv
2. filtered_genes.tsv
3. filtered_studies.tsv
4. filtered_traits.tsv
5. gt_provenance.tsv.gz
6. gt_stats.tsv.gz
7. efo_graph.graphml.gz
8. gwascat_release.txt
9. efo_release.txt
10. tcrd_info.tsv
TIGA download files should be copied to the TIGA Download Directory for automated access.

Notes

Split comma separated fields, convert to UTF-8 characters.
Gene-trait association variables:
- N_study: studies supporting gene-trait association
- N_snp: SNPs involved with gene-trait association
- N_snpw(*): SNPs involved with gene-trait association weighted by genomic distance
- RCRAS(*): RCR Aggregated Score
- pValue(*): max SNP pValues
- OR: median(OR), where OR = odds ratio
- N_beta: count of supporting beta values
- geneNtrait: total traits associated with gene
- traitNgene: total genes associated with trait
Gene-trait scores and ranks:
- meanRank: meanRank based on variables selected(*) by benchmark validation.
- meanRankScore: 100 - Percentile(meanRank)
MySql database intended for transition toward IDG TCRD integration (currently not required for TIGA app).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WORKFLOW.md

WORKFLOW.md

TIGA Workflow

Dependencies

Steps

Notes

Files

WORKFLOW.md

Latest commit

History

WORKFLOW.md

File metadata and controls

TIGA Workflow

Dependencies

Steps

Notes