A database that includes the largest published studies investigating the cancer cell lines to chemical compound treatment, and the association between drug sensitivity and multi-omics.
You can also visit the stable version: mugpeng/OmicsPharDB at v1.0.0 (github.com)
This project is merged into openbiox/UCSCXenaShiny: 📊 An R package for interactively exploring UCSC Xena https://xenabrowser.net/datapages/; Book: https://lishensuo.github.io/UCSCXenaShiny_Book; App online: https://shiny.hiplot.cn/ucsc-xena-shiny/, https://shiny.zhoulab.ac.cn/UCSCXenaShiny, and will NOT be maintained anymore.
Or visit XenaShiny (hiplot.cn) by using the Pharmacogenics analysis module.
Our work has been published: Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny | Communications Biology (nature.com)
Please cite when you used in your study:
Li, S., Peng, Y., Chen, M. et al. Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny. Commun Biol 7, 1200 (2024). https://doi.org/10.1038/s42003-024-06891-2
Feel free to talk with me if you find any bugs or have any suggestions. :)
Email: [email protected], [email protected]
long time goal:
- drug combination
- more data
- database
- in vivo
- make useful function into R package for coding users
- features combination
important:
- specific type
- compare after group, subtype compare (add into omic-drug pair)
- make function into package for data scientist
- interrupt opt without redundant calculation
- multithread or others to accelerate
- too much obj names (replicate obj names)
- Check and comment backend code
- Recheck the drug, omics data, if error or omit
- cell and drug filter, tsne plot, impute
Upload consensus processed datadownload buttonmanual booksignificant pairs- multicores?
deploy to github
others:
- change name ccle_exp into ccle_mRNA
- rewrite DrugOmicPair part
Specific test page(per function per test)- check discreate data(cells without omics info in sensitivity comparison)
- check drug anno files, cell anno files
drug similaritysensitivity and omics from different projects
This repository is archived.
This project is merged into openbiox/UCSCXenaShiny: 📊 An R package for interactively exploring UCSC Xena https://xenabrowser.net/datapages/; Book: https://lishensuo.github.io/UCSCXenaShiny_Book; App online: https://shiny.hiplot.cn/ucsc-xena-shiny/, https://shiny.zhoulab.ac.cn/UCSCXenaShiny, and will NOT be maintained anymore.
Or visit XenaShiny (hiplot.cn) by using the Pharmacogenics analysis module.
Happy to announce. Our work has been published: Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny | Communications Biology (nature.com) Celebration!!!
Please cite when you used in your study:
Li, S., Peng, Y., Chen, M. et al. Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny. Commun Biol 7, 1200 (2024). https://doi.org/10.1038/s42003-024-06891-2
- Formalize the names in each pages
- Add some useful descriptions for users
- Allow user to define the thresholds to filter the pairs
- change the module name from "Features database significant analysis" to "Scaling features associations analysis"
Good news, you can also use full version by visiting XenaShiny (hiplot.cn):
Upload data to zenodo: https://zenodo.org/records/10553615
Btw, aliyunpan is really hard to use:
Fix the bugs:
1.0: mugpeng/OmicsPharDB at v1.0.0 (github.com)
Deploy the project mini version on shiny, github, private website.
Version 1.0 has been released, containing data from six projects: CTRP1, CTRP2, PRISM, GDSC1, GDSC2, and gCSI. It also includes three analysis modules offering four functions for further processing and exploration of the data.
For more details, please check the Others\DataInfo\Table1.docx
in backend directory:
You can scroll down this readme, and download the raw data, scripts and others at For developer part.
Fork the repository locally, open Rproj, just run App.R
.
Input data is very large, you can download throuth:
OmicsPhar_Front_Input_0122: https://zenodo.org/records/10553615
Then put the input file in the work dir.
Good news, you can also use full version by visiting XenaShiny (hiplot.cn):
Alternatively, you can use this code to automatically download and execute the packaged shiny project. However, please note that the test project utilizes sampled features (only 100):
runGitHub("OmicsPharDB_github_testRun", "mugpeng")
Or visit the website deployed at shiny:
OmicsPharDB ([email protected])
The website consisted of four main sections (pages): 1) Drug-omics pairs analysis; 2) Profile Display; 3) Significant analysis of features database; and 4) Statistics Information, which is the final page. The tour will start from this last page.
Following the principle of least change, AAC values from gCSI are rescaled such that a lower metric indicates higher sensitivity across all datasets. (AAC2 = max(AAC) - AAC)
Lower scores on metric scales mean stronger drug sensitivity.
The omics data we collected including mRNA expression, protein data, copy number variant(CNV), gene fusion, methylation, mutation(gene mutation, and a specific amino acid change).
In vitro cell line pharmacogenetics studies can be categorized into CCLE, GDSC, and other projects (produced by different insititutions). CCLE, GDSC projects produce the genomics data, then other experiments are conducted to generate drug data used the same cell lines one of the projects.
The cell lines from the Cancer Cell Line Encyclopedia (CCLE) were used to generate several drug sensitivity projects, including CTRP1, CTRP2, and PRISM. Meanwhile, the Genomics of Drug Sensitivity in Cancer (GDSC) project generated GDSC1 and GDSC2. And Genentech Cell Screening Initiative (gCSI) project has its own omics and drug response data.
The PRISM project tests the highest number of drugs, while GDSC1 focuses on testing the largest number of cells.
Projects have tested same drugs and same cells. When multiple projects test the same drugs and cells, there is typically a higher degree of overlap between projects that use the same cell lines.
The cell lines are mainly from lung cancer, colorectal cancer, and ovarian cancer:
Also there is annotation for cell and drug:
You can search drugs of interests.
This feature allowed user to explore the association between a selected drug resistance event and a certain omic. For continuous omics data like mRNA, methylation, copy number variant, protein, spearman correlation was calculated. While for discrete omics data such as mutation genes, mutation gene points or gene fusions, wilcoxon test is chosen for testing Signification.
The title of each plot indicates the source of the omics and drug response data. For example, a plot titled gdsc_ctrp1
would mean the omics data is from the GDSC project, while the drug sensitivity data is from the CTRP1 project, as mentioned in the initial Statistics Information section. Personally, I think comparing cells data from different organizations (e.g. GDSC vs CCLE) is reasonable for analyzing correlations, as we are primarily interested in examining the relationship between omic features and drug responses, regardless of the original data source. Combining data from multiple sources can provide a more comprehensive view of these relationships.
The upper figure shows that the gene expression of ABCC3 is significantly positively correlated with sensitivity to the drug YM-155 across all ten dataset combinations. This correlation could potentially be explained by the known function of the ABCC3 gene. Specifically, ABCC3 encodes an ATP-binding cassette transporter protein that is involved in exporting various molecules, including drugs, out of cells via active transport across cell membranes. Given its role in drug efflux, higher ABCC3 expression may correlate with increased efflux and reduced intracellular concentration of YM-155, resulting in greater resistance to the drug's effects. This might explain the observed positive correlation between ABCC3 expression levels and higher values for YM-155.
Another example is gene mutation of TP53 and drug AMG-232, it is obvious that the wild type TP53 has significant higher sensitivity:
AMG-232 is an inhibitor of the p53-MDM2 interaction. Mutated tp53 may deactivate the suppressor program induced by AMG-232 through disruption of this interaction: p53-family proteins and their regulators: hubs and spokes in tumor suppression | Cell Death & Differentiation (nature.com)
This page consists of two parts.
The first one is features across different types.
This page is designed to detect covariates like cell source types, age, and gender. Currently, only cell source type detection is available. The user chooses a certain drug, and it will return all datasets including this drug visualized as a boxplot with the significant test to check if there is an association between subtypes and a certain drug sensitivity metric.
It is clearly that SNX-2112, a selective Hsp90 inhibitor, potently inhibits tumor in multiple myeloma and other hematologic tumors, has higher sensitivity in leukemia and lymphcma.
For discrete feature types such as gene fusion, mutation(gene mutation, and a specific amino acid change), Chi-squared test is utilized:
The second part was the profile of drug sensitivity. The T-SNE dimensionality reduction plots were generated for comparing each drug for inspecting 1) if two drugs with similar drug targets but showing different drug sensitivity or 2) if two drugs with different drug targets but having close drug sensitivity. Besides, the median versus variance scatter would tell if a drug had a wide range sensitivity in different cell lines and its sensitivity rank in the database.
For example, MAD&MEDIAN plot indicated that VINCRISTINE was an effective drug both in CTRP2, GDSC2, PRISM, And provenly, it was a FDA-approved clinical drug for many types of tumors.
But not on the top in gCSI:
This analysis module helped people to conduct a significant test between a targeted feature(a drug or an omic) and all the features in a particular feature dataset grouped by their collected databases in a large scale.
The effect and p-value is calculated depend on the data types:
- For continuous features compared to continuous datasets (e.g. drug A levels versus all CNV features), the Pearson correlation coefficient R is used, ranging from 0-1.
- For discrete features compared to discrete databases (e.g. TP53 mutation events versus all collected gene fusions), the odds ratio is used. An odds ratio >1 indicates the selected feature has a higher probability of the observed association/events in the database. P-value is calculated using the Wilcoxon test.
- For discrete features compared to discrete databases, the log2 fold change (events/wildtype) is also used as the effect measure. P-value is generated from a Chi-squared test.
A feature-database pair will be considered statistically significant if both of the following criteria are met:
- The absolute value of the effect size is greater than 0.2, 4, 4 in each case in default setting. You can change it in the specific case.
- The p-value is less than 0.05.
We will find the potential related mRNA with Lapatinib as an example:
Frequency table has two columns, frequency col counts the number of pairs labeled as significant in all databases containing this pair. Proportion col is the fraction of significant pair in all pairs. You can choose the topmost to further examination with result table.
For example, we are interested about CDH1 gene, which is a classical cadherin of the cadherin superfamily, Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis, described from genecode: CDH1 Gene - GeneCards | CADH1 Protein | CADH1 Antibody
We can search this gene by the search box at the top right edge. The results indicate that higher expression of CDH1 is correlated with increased sensitivity to LAPATINIB. As drug resistance metrics are negatively correlated with drug sensitivity, a higher CDH1 expression level tends to predict greater LAPATINIB sensitivity.
A downloadable button is also provided, allowing users to access a CSV file containing the data. This CSV file can then be used for additional analyses or data processing as needed.
A simple online search reveals that CDH1 is related to ERBB2, and ERBB2 is a validated target of LAPATINIB. This suggests that CDH1 expression levels may help determine a tumor's response to LAPATINIB treatment, potentially through its relationship to the drug's primary target, ERBB2.
By the way, we can also double check it through Drugs-omics pairs analysis module:
- Please be patient
The features database significant analysis module may take long time.
- GDSC
mRNA array expression:
Home (cancerrxgene.org) : Cell_line_RMA_proc_basalExp.txt
other omics:
https://orcestra.ca/pset/10.5281/zenodo.7829919
GDSC1,2 drug AUC:
https://depmap.org/portal/download/all/, select "Sanger GDSC1 and GDSC2" dataset.
- CCLE
All Omics:
https://depmap.org/portal/download/all/
CTRP1,2 drug:
https://portals.broadinstitute.org/ctrp.v1/
https://portals.broadinstitute.org/ctrp.v2.1/
Index of /Public/Broad (nih.gov)
CTD² Data from The Broad Institute - NCI (cancer.gov)
PRISM drug:
https://depmap.org/portal/download/all/, select "PRISM Repurposing 19Q4 Primary Files" dataset.
- gCSI
Both omics and drug data are from Orcestra:
https://orcestra.ca/pset/10.5281/zenodo.4737437
For more details, please check the Others\DataInfo\Table1.docx
:
More details on the raw data, scripts, and backend preprocessing can be downloaded:
OmicsPharBackend_240122: OmicsPhar Extra Data Repository (zenodo.org)
The methodology and implementation details can be found in the preprint (may now be outdated):
├─App.R
├─Input
│ ├─01
│ ├─02
│ ├─03
│ ├─04
│ └─05
├─Log
│ └─图片
├─Modules
├─readme_backup
│ └─图片
├─Script
└─Test
└─Test_Module
I have modularize my shiny project, The UI, server part, module scripts in the Modules files, and panel displayed on the websites are interconnected with each other:
If you would like to fork this project and add new modules. there are several steps in short.
- Copy the existed module scripts.
- Change the UI and server for your own needs.
- Create a new
TabPanel
and usecallModule
function to call the corresponded server function. - Test them until all bugs are eradicated.
- Share you coool new function with you friends, and I encourage you to pull requests to me!
- Celebrate~
You can also use the Test
directory to test both new function and new modules:
Chinese: #shiny
If you have an interest in shiny, pharmacogenomics, or bioinformatics, and if you're enthusiastic about contributing to the open-source community, feel free to join me.
ps: I'm also open to supporting you in your endeavors, carry me (抱紧你的大腿).
I can also offer guidance and provide you with the opportunity to co-author a paper based on your contributions.
Currently, there are several difficulties:
- multithread or others to accelerate
The Features database significant analysis module is quite time-consuming.
The selected feature needs to perform statistical calculations using all the feature data in the intersected database. Initially, I utilized Snowfall for parallel computing, but I abandoned it due to low efficiency during the launch stage(need to initiate every time).
Besides, snowfall could have potential risks that may induce error by garbage collection when interrupt it.
similar question: r - Snowfall sfApply() is slower than apply() - Stack Overflow
Are there any alternative methods to enhance the calculation speed?
- interrupt opt without redundant calculation
shiny stop calculation when change other operation
For example, the user select an unwanted pairs, how to interrupt it instead of unnecessary waiting or reopen the project?
- Out of memory
I have decided to upload the project to my own server with 2 cores and 2GB of memory, as well as the shinyapp. However, both of them encountered a similar error while loading the packages:
Interestingly, I have successfully deployed a similar project on ShinyApp without encountering this error: Leukemia Multi-Omics_Drugs Sensitivity Database ([email protected]), even though it also required loading the plotly
package.
After several attempts, it has become clear that the reason for the error is the large size of the full dataset, which exceeds the capacity of my small memory server. This conclusion is drawn from the fact that the project runs successfully with the test data. However, a new problem arises: if I still wish to run the project on my limited server, are there any alternative solutions? For instance, could utilizing RMySQL be a viable tactic?
- Cannot find the object
I found some objects created at the main App.R script cannot be accessed through the module scripts.
For example, I create global env:
And used in the module:
But the beforementioned error appears.
- ggplot Error in self$geom$rename_size && "size" %in% names(plot$mapping)
Then ggplot was updated to the latest version:
I reinstall the 3.4.4, solve the error.
Index of /src/contrib/Archive/ggplot2 (r-project.org)
install.packages("ggplot2_3.4.4.tar.gz",repos=NULL)