go-cluster

Clustering PPINs by GO similarity scores.

Uses STRINGdb and INet networks, GSE23561 microarray gene expressions data, GOSemSim and org.Hs.ed.db packages.

Running instructions

You first need download and export the PPI data (i.e. link files) into LINKS/ directory using the download instructions given in LINKS/DOWNLOAD.txt.

To be able run the pipeline with different Gene Expression and PPI data, you must apply the following changes:

Replace the RAW.csv file with your own gene expression data. This is a comma-separated matrix, where each column represents genes and each column represents the subjects (i.e., persons or test-conditions). Please, see the RAW.csv as an example.
Put the PPI file to the LINKS/ directory. This file must consist of a comma-separated edgelist representing the PPI score (i.e., weight) for the gene pairs. Please, see the LINKS/elinks_inet.csv as an example.
Update the variables in the vars.R file to match your Gene Expression and PPI data. It is necessary to reassign the ranges (i.e., intervals) in order to correctly separate the control group and each disease group in Gene Expression matrix. It is also necessary to reset the cutoff value for PPI scores used in PPIN reduction. In vars.R, you can also change the t-test significance (P_VAL) and the fold-change threshold (FC) that are used in the DEG analysis.

Pipeline

Scripts

vars.R manages the packages, global variables, paths, and I/O files.
degs.R includes the necessary functions to identify differentially expressed genes.
links.R handles reading, preprocessing, and mapping of PPI data.
msLinks.R responsible from filtering out unmapped and insignificant PPIs.
go.R fetches GO information and calculates the GO similarity scores.
spici.R, mcl.R, linkcomm.R performs the clustering operations.
bhi.R calculates Biological Homogeneity Index for the clusters.
stability.R calculates stability of disease modules.
validation.R searches for the identified genes in DEGS of the validation sets.
plot.R responsible from generating the plots for the obtained results.
stats.R, bhi_stats.R creates tables including statistics about the clustering or validation.

Dependencies

It is highly recommended to install all packages required in the vars.R file.

To overcome possible dependency problems, please run the scripts in the following order, and note that all scripts depend to vars.R which manages the packages as well as the global paths, files, and variables:

vars.R ➡️ degs.R ➡️ links.R ➡️ msLinks.R ➡️ go.R ➡️ spici.R || mcl.R || linkcomm.R ➡️ bhi.R || stability.R ➡️ validation.R ➡️ plot.R || stats.R || bhi_stats.R

Citation

S. Tenekeci, S. Tekir, Identifying promoter and enhancer sequences by graph convolutional networks, Computational Biology and Chemistry (2024) https://doi.org/10.1016/j.compbiolchem.2024.108040

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
DEGs		DEGs
LINKS		LINKS
PLOTS		PLOTS
RES		RES
STATS		STATS
VALS		VALS
.DS_Store		.DS_Store
.gitignore		.gitignore
GPL.csv		GPL.csv
LICENSE		LICENSE
RAW.csv		RAW.csv
README.md		README.md
bhi.R		bhi.R
bhi_stats.R		bhi_stats.R
degs.R		degs.R
env.RData		env.RData
geo_search_query.txt		geo_search_query.txt
go.R		go.R
linkcomm.R		linkcomm.R
links.R		links.R
map_inet.csv		map_inet.csv
map_string.csv		map_string.csv
mcl.R		mcl.R
mslinks.R		mslinks.R
overlap.R		overlap.R
pipeline.png		pipeline.png
plot.R		plot.R
rprof.log		rprof.log
spici.R		spici.R
stability.R		stability.R
stats.R		stats.R
time.csv		time.csv
validation.R		validation.R
vars.R		vars.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-cluster

Running instructions

Pipeline

Scripts

Dependencies

Citation

About

Releases

Packages

Languages

License

smtnkc/go-cluster

Folders and files

Latest commit

History

Repository files navigation

go-cluster

Running instructions

Pipeline

Scripts

Dependencies

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages