OmicsPharDB

A database that includes the largest published studies investigating the cancer cell lines to chemical compound treatment, and the association between drug sensitivity and multi-omics.

You can also visit the stable version: mugpeng/OmicsPharDB at v1.0.0 (github.com)

This project is merged into openbiox/UCSCXenaShiny: 📊 An R package for interactively exploring UCSC Xena https://xenabrowser.net/datapages/; Book: https://lishensuo.github.io/UCSCXenaShiny_Book; App online: https://shiny.hiplot.cn/ucsc-xena-shiny/, https://shiny.zhoulab.ac.cn/UCSCXenaShiny, and will NOT be maintained anymore.

Or visit XenaShiny (hiplot.cn) by using the Pharmacogenics analysis module.

Our work has been published: Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny | Communications Biology (nature.com)

Please cite when you used in your study:

Li, S., Peng, Y., Chen, M. et al. Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny. Commun Biol 7, 1200 (2024). https://doi.org/10.1038/s42003-024-06891-2

Contact with me

Feel free to talk with me if you find any bugs or have any suggestions. :)

Email: [email protected], [email protected]

TODO

long time goal:

drug combination
more data
- database
- in vivo
make useful function into R package for coding users
features combination

important:

specific type
- compare after group, subtype compare (add into omic-drug pair)
make function into package for data scientist
interrupt opt without redundant calculation
multithread or others to accelerate
too much obj names (replicate obj names)
Check and comment backend code
- Recheck the drug, omics data, if error or omit
- cell and drug filter, tsne plot, impute
~~Upload consensus processed data~~
~~download button~~
~~manual book~~
~~significant pairs~~
- multicores?
~~deploy to github~~

others:

change name ccle_exp into ccle_mRNA
rewrite DrugOmicPair part
~~Specific test page(per function per test)~~
check discreate data(cells without omics info in sensitivity comparison)
check drug anno files, cell anno files
~~drug similarity~~
~~sensitivity and omics from different projects~~

Update

09/30/24

This repository is archived.

This project is merged into openbiox/UCSCXenaShiny: 📊 An R package for interactively exploring UCSC Xena https://xenabrowser.net/datapages/; Book: https://lishensuo.github.io/UCSCXenaShiny_Book; App online: https://shiny.hiplot.cn/ucsc-xena-shiny/, https://shiny.zhoulab.ac.cn/UCSCXenaShiny, and will NOT be maintained anymore.

Or visit XenaShiny (hiplot.cn) by using the Pharmacogenics analysis module.

Happy to announce. Our work has been published: Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny | Communications Biology (nature.com) Celebration!!!

Please cite when you used in your study:

Li, S., Peng, Y., Chen, M. et al. Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny. Commun Biol 7, 1200 (2024). https://doi.org/10.1038/s42003-024-06891-2

03/02/24

Formalize the names in each pages
Add some useful descriptions for users
Allow user to define the thresholds to filter the pairs
change the module name from "Features database significant analysis" to "Scaling features associations analysis"

Good news, you can also use full version by visiting XenaShiny (hiplot.cn)：

01/22/24

Upload data to zenodo: https://zenodo.org/records/10553615

Btw, aliyunpan is really hard to use:

Fix the bugs:

1.0: mugpeng/OmicsPharDB at v1.0.0 (github.com)

01/19/24

Deploy the project mini version on shiny, github, private website.

01/17/24

Version 1.0 has been released, containing data from six projects: CTRP1, CTRP2, PRISM, GDSC1, GDSC2, and gCSI. It also includes three analysis modules offering four functions for further processing and exploration of the data.

For more details, please check the Others\DataInfo\Table1.docx in backend directory:

You can scroll down this readme, and download the raw data, scripts and others at For developer part.

How to run

Full version

Fork the repository locally, open Rproj, just run App.R.

Input data is very large, you can download throuth:

OmicsPhar_Front_Input_0122: https://zenodo.org/records/10553615

Then put the input file in the work dir.

Good news, you can also use full version by visiting XenaShiny (hiplot.cn)：

Test version(maybe outmoded)

Alternatively, you can use this code to automatically download and execute the packaged shiny project. However, please note that the test project utilizes sampled features (only 100):

runGitHub("OmicsPharDB_github_testRun", "mugpeng")

Or visit the website deployed at shiny:

OmicsPharDB ([email protected])

Tutorial

The website consisted of four main sections (pages): 1) Drug-omics pairs analysis; 2) Profile Display; 3) Significant analysis of features database; and 4) Statistics Information, which is the final page. The tour will start from this last page.

Following the principle of least change, AAC values from gCSI are rescaled such that a lower metric indicates higher sensitivity across all datasets. (AAC2 = max(AAC) - AAC)

Lower scores on metric scales mean stronger drug sensitivity.

Statistics Information

The omics data we collected including mRNA expression, protein data, copy number variant(CNV), gene fusion, methylation, mutation(gene mutation, and a specific amino acid change).

In vitro cell line pharmacogenetics studies can be categorized into CCLE, GDSC, and other projects (produced by different insititutions). CCLE, GDSC projects produce the genomics data, then other experiments are conducted to generate drug data used the same cell lines one of the projects.

The cell lines from the Cancer Cell Line Encyclopedia (CCLE) were used to generate several drug sensitivity projects, including CTRP1, CTRP2, and PRISM. Meanwhile, the Genomics of Drug Sensitivity in Cancer (GDSC) project generated GDSC1 and GDSC2. And Genentech Cell Screening Initiative (gCSI) project has its own omics and drug response data.

The PRISM project tests the highest number of drugs, while GDSC1 focuses on testing the largest number of cells.

Projects have tested same drugs and same cells. When multiple projects test the same drugs and cells, there is typically a higher degree of overlap between projects that use the same cell lines.

The cell lines are mainly from lung cancer, colorectal cancer, and ovarian cancer:

Also there is annotation for cell and drug:

You can search drugs of interests.

Main Function

1) Drugs-omics pairs analysis

This feature allowed user to explore the association between a selected drug resistance event and a certain omic. For continuous omics data like mRNA, methylation, copy number variant, protein, spearman correlation was calculated. While for discrete omics data such as mutation genes, mutation gene points or gene fusions, wilcoxon test is chosen for testing Signification.

The title of each plot indicates the source of the omics and drug response data. For example, a plot titled gdsc_ctrp1 would mean the omics data is from the GDSC project, while the drug sensitivity data is from the CTRP1 project, as mentioned in the initial Statistics Information section. Personally, I think comparing cells data from different organizations (e.g. GDSC vs CCLE) is reasonable for analyzing correlations, as we are primarily interested in examining the relationship between omic features and drug responses, regardless of the original data source. Combining data from multiple sources can provide a more comprehensive view of these relationships.

The upper figure shows that the gene expression of ABCC3 is significantly positively correlated with sensitivity to the drug YM-155 across all ten dataset combinations. This correlation could potentially be explained by the known function of the ABCC3 gene. Specifically, ABCC3 encodes an ATP-binding cassette transporter protein that is involved in exporting various molecules, including drugs, out of cells via active transport across cell membranes. Given its role in drug efflux, higher ABCC3 expression may correlate with increased efflux and reduced intracellular concentration of YM-155, resulting in greater resistance to the drug's effects. This might explain the observed positive correlation between ABCC3 expression levels and higher values for YM-155.

Another example is gene mutation of TP53 and drug AMG-232, it is obvious that the wild type TP53 has significant higher sensitivity:

AMG-232 is an inhibitor of the p53-MDM2 interaction. Mutated tp53 may deactivate the suppressor program induced by AMG-232 through disruption of this interaction: p53-family proteins and their regulators: hubs and spokes in tumor suppression | Cell Death & Differentiation (nature.com)

2) Profile Display

This page consists of two parts.

features across different types

The first one is features across different types.

This page is designed to detect covariates like cell source types, age, and gender. Currently, only cell source type detection is available. The user chooses a certain drug, and it will return all datasets including this drug visualized as a boxplot with the significant test to check if there is an association between subtypes and a certain drug sensitivity metric.

It is clearly that SNX-2112, a selective Hsp90 inhibitor, potently inhibits tumor in multiple myeloma and other hematologic tumors, has higher sensitivity in leukemia and lymphcma.

For discrete feature types such as gene fusion, mutation(gene mutation, and a specific amino acid change), Chi-squared test is utilized:

Profile of drug sensitivity

The second part was the profile of drug sensitivity. The T-SNE dimensionality reduction plots were generated for comparing each drug for inspecting 1) if two drugs with similar drug targets but showing different drug sensitivity or 2) if two drugs with different drug targets but having close drug sensitivity. Besides, the median versus variance scatter would tell if a drug had a wide range sensitivity in different cell lines and its sensitivity rank in the database.

For example, MAD&MEDIAN plot indicated that VINCRISTINE was an effective drug both in CTRP2, GDSC2, PRISM, And provenly, it was a FDA-approved clinical drug for many types of tumors.

But not on the top in gCSI:

3) Scaling features associations analysis

This analysis module helped people to conduct a significant test between a targeted feature(a drug or an omic) and all the features in a particular feature dataset grouped by their collected databases in a large scale.

The effect and p-value is calculated depend on the data types:

For continuous features compared to continuous datasets (e.g. drug A levels versus all CNV features), the Pearson correlation coefficient R is used, ranging from 0-1.
For discrete features compared to discrete databases (e.g. TP53 mutation events versus all collected gene fusions), the odds ratio is used. An odds ratio >1 indicates the selected feature has a higher probability of the observed association/events in the database. P-value is calculated using the Wilcoxon test.
For discrete features compared to discrete databases, the log2 fold change (events/wildtype) is also used as the effect measure. P-value is generated from a Chi-squared test.

A feature-database pair will be considered statistically significant if both of the following criteria are met:

The absolute value of the effect size is greater than 0.2, 4, 4 in each case in default setting. You can change it in the specific case.
The p-value is less than 0.05.

We will find the potential related mRNA with Lapatinib as an example:

Frequency table has two columns, frequency col counts the number of pairs labeled as significant in all databases containing this pair. Proportion col is the fraction of significant pair in all pairs. You can choose the topmost to further examination with result table.

For example, we are interested about CDH1 gene, which is a classical cadherin of the cadherin superfamily, Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis, described from genecode: CDH1 Gene - GeneCards | CADH1 Protein | CADH1 Antibody

We can search this gene by the search box at the top right edge. The results indicate that higher expression of CDH1 is correlated with increased sensitivity to LAPATINIB. As drug resistance metrics are negatively correlated with drug sensitivity, a higher CDH1 expression level tends to predict greater LAPATINIB sensitivity.

A downloadable button is also provided, allowing users to access a CSV file containing the data. This CSV file can then be used for additional analyses or data processing as needed.

A simple online search reveals that CDH1 is related to ERBB2, and ERBB2 is a validated target of LAPATINIB. This suggests that CDH1 expression levels may help determine a tumor's response to LAPATINIB treatment, potentially through its relationship to the drug's primary target, ERBB2.

By the way, we can also double check it through Drugs-omics pairs analysis module:

Tips

Please be patient

The features database significant analysis module may take long time.

For developers

Raw data and back end

GDSC

mRNA array expression:

Home (cancerrxgene.org) : Cell_line_RMA_proc_basalExp.txt

other omics:

https://orcestra.ca/pset/10.5281/zenodo.7829919

GDSC1,2 drug AUC:

https://depmap.org/portal/download/all/, select "Sanger GDSC1 and GDSC2" dataset.

CCLE

All Omics:

https://depmap.org/portal/download/all/

CTRP1,2 drug:

https://portals.broadinstitute.org/ctrp.v1/

https://portals.broadinstitute.org/ctrp.v2.1/

Index of /Public/Broad (nih.gov)

CTD² Data from The Broad Institute - NCI (cancer.gov)

PRISM drug:

https://depmap.org/portal/download/all/, select "PRISM Repurposing 19Q4 Primary Files" dataset.

gCSI

Both omics and drug data are from Orcestra:

https://orcestra.ca/pset/10.5281/zenodo.4737437

For more details, please check the Others\DataInfo\Table1.docx:

More details on the raw data, scripts, and backend preprocessing can be downloaded:

OmicsPharBackend_240122: OmicsPhar Extra Data Repository (zenodo.org)

The methodology and implementation details can be found in the preprint (may now be outdated):

OmicsPharLeuDB: an integrative database for mining pharmacogenomic data in acute lymphoblastic leukemia | bioRxiv

Code structure

├─App.R
├─Input
│  ├─01
│  ├─02
│  ├─03
│  ├─04
│  └─05
├─Log
│  └─图片
├─Modules
├─readme_backup
│  └─图片
├─Script
└─Test
    └─Test_Module

I have modularize my shiny project, The UI, server part, module scripts in the Modules files, and panel displayed on the websites are interconnected with each other:

If you would like to fork this project and add new modules. there are several steps in short.

Copy the existed module scripts.
Change the UI and server for your own needs.
Create a new TabPanel and use callModule function to call the corresponded server function.
Test them until all bugs are eradicated.
Share you coool new function with you friends, and I encourage you to pull requests to me!
Celebrate~

You can also use the Test directory to test both new function and new modules:

Ways to learn shiny

Shiny - Welcome to Shiny

Welcome | Mastering Shiny

Chinese: #shiny

Walk with me

If you have an interest in shiny, pharmacogenomics, or bioinformatics, and if you're enthusiastic about contributing to the open-source community, feel free to join me.

ps: I'm also open to supporting you in your endeavors, carry me (抱紧你的大腿).

I can also offer guidance and provide you with the opportunity to co-author a paper based on your contributions.

Currently, there are several difficulties:

multithread or others to accelerate

The Features database significant analysis module is quite time-consuming.

The selected feature needs to perform statistical calculations using all the feature data in the intersected database. Initially, I utilized Snowfall for parallel computing, but I abandoned it due to low efficiency during the launch stage(need to initiate every time).

Besides, snowfall could have potential risks that may induce error by garbage collection when interrupt it.

similar question: r - Snowfall sfApply() is slower than apply() - Stack Overflow

Are there any alternative methods to enhance the calculation speed?

interrupt opt without redundant calculation

https://stackoverflow.com/questions/30587883/is-it-possible-to-stop-executing-of-r-code-inside-shiny-without-stopping-the-sh

https://stackoverflow.com/questions/34226789/getting-shiny-to-update-the-ui-and-and-run-long-calculation-afterwards

shiny stop calculation when change other operation

For example, the user select an unwanted pairs, how to interrupt it instead of unnecessary waiting or reopen the project?

Out of memory

I have decided to upload the project to my own server with 2 cores and 2GB of memory, as well as the shinyapp. However, both of them encountered a similar error while loading the packages:

Interestingly, I have successfully deployed a similar project on ShinyApp without encountering this error: Leukemia Multi-Omics_Drugs Sensitivity Database ([email protected]), even though it also required loading the plotly package.

After several attempts, it has become clear that the reason for the error is the large size of the full dataset, which exceeds the capacity of my small memory server. This conclusion is drawn from the fact that the project runs successfully with the test data. However, a new problem arises: if I still wish to run the project on my limited server, are there any alternative solutions? For instance, could utilizing RMySQL be a viable tactic?

Cannot find the object

I found some objects created at the main App.R script cannot be accessed through the module scripts.

For example, I create global env:

And used in the module:

But the beforementioned error appears.

Bugs

ggplot Error in self$geom$rename_size && "size" %in% names(plot$mapping)

Then ggplot was updated to the latest version:

I reinstall the 3.4.4, solve the error.

Index of /src/contrib/Archive/ggplot2 (r-project.org)

install.packages("ggplot2_3.4.4.tar.gz",repos=NULL)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Modules		Modules
Test		Test
图片		图片
.gitattributes		.gitattributes
.gitignore		.gitignore
App.R		App.R
LICENSE		LICENSE
Project.Rproj		Project.Rproj
config.yml		config.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmicsPharDB

Contact with me

TODO

Update

09/30/24

03/02/24

01/22/24

01/19/24

01/17/24

How to run

Full version

Test version(maybe outmoded)

Tutorial

Statistics Information

Main Function

1) Drugs-omics pairs analysis

2) Profile Display

features across different types

Profile of drug sensitivity

3) Scaling features associations analysis

Tips

For developers

Raw data and back end

Code structure

Ways to learn shiny

Walk with me

Bugs

.

About

Releases 1

Packages

Languages

License

mugpeng/OmicsPharDB

Folders and files

Latest commit

History

Repository files navigation

OmicsPharDB

Contact with me

TODO

Update

09/30/24

03/02/24

01/22/24

01/19/24

01/17/24

How to run

Full version

Test version(maybe outmoded)

Tutorial

Statistics Information

Main Function

1) Drugs-omics pairs analysis

2) Profile Display

features across different types

Profile of drug sensitivity

3) Scaling features associations analysis

Tips

For developers

Raw data and back end

Code structure

Ways to learn shiny

Walk with me

Bugs

.

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages