Gene2WordClouds

Gene2WordClouds extracts transcription factor(TF)-specific terms from the abstracts of available PubMed literature and creates a wordcloud. Term frequencies computed across the entire set of literature associated with queried TFs served as the background and term frequencies computed from literature associated with an individual TF served as the foreground. The 200 most enriched terms for each individual TF were extracted and displayed in a wordcloud.

Pipeline

Scheme

Python Scripts

All scripts are located in utils directory.

uniacc2entrezid.py - convert Uniprot accession to Entrez gene ID (EntrezIDs).
entrezid2pmids.py - extract all PubMed IDs (PMIDs) associated with query EntrezIDs.
gene2pmids_stats.py - compute statistics regarding PMIDs and EntrezIDs.
pmid2abstract.py - download abstracts of the PMID papers.
abstract2words.py - tally words within each abstract, excluding punctuation and stop words. Stems are obtained for each word.
Compute the Inverse Document Frequency (IDF) for each stem.
Formula = total number of PMIDs in the analysis over the number of PMIDs containing the specific stem.
Compute the Term Frequency-IDF of each stem for each TF.
- Formula = number of TF-associated PMIDs containing the specific stem multiplied by the IDF of the stem.
words2cloud.py - generate a word cloud for each transcription factor with the top 200 words of unique stems. (Stems are used to remove redundancy in words such as "ctcf" and "ctcf-binding".)

Dependencies

Gene2WordClouds require the following main dependencies.

Package	Version
biopython	1.78
click	7.1.2
click-option-group	0.5.1
distance	0.1.3
fuzzywuzzy	0.18.0
matplotlib	3.3.3
nltk	3.4.4
nltk_data	2019.07.04
numpy	1.22.4
pandas	1.2.1
python	3.8.5
scikit-learn	0.24.1
seaborn	0.11.1
tqdm	4.35.0
unipressed	1.2.0
wordcloud	1.8.1

Conda Environment

All packages are provided within the YML environment file. A conda environment named wordclouds can be created using the following command.

conda env create -f ./conda/environment.yml

Usage

Application

The script gene2wordclouds.py starts the Gene2WordClouds application with the following required input options:

-i / --identifier - Uniprot accession or Entrez gene ID of the gene of interest
- --input-file - list of identifiers provided in a file
-e / --email - email address

Nonmandatory options:

--input-type - identifier type: [ entrezid | uniacc ] (default: entrezid)
--output-dir - path to output directory (default: ./)
-p / --prefix - prefix of subdirectory for output (default: md5 digest)
--threads - number of threads (default: 1)
--zscore - Z-score filter for PMIDs

Example: python gene2wordclouds.py --input-file ./dbTF/Homo_sapiens.uniacc.txt --input-type uniacc --email [email protected] --prefix BBA --zscore 2

Web Interface

To run the web interface in your default web browser, go to the Terminal app and run the following command from the root folder of the repository:

R -e "shiny::runApp('./app/app.R', launch.browser = TRUE)"

This will automatically open a new browser window or tab with the URL http://127.0.0.1:7422.
To terminate the process simply press CTRL + C in the Terminal.

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
BBA		BBA
JASPAR		JASPAR
app		app
conda		conda
dbTF		dbTF
previews		previews
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gene2wordcloud_scheme.png		gene2wordcloud_scheme.png
gene2wordclouds.py		gene2wordclouds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gene2WordClouds

Table of Contents

Pipeline

Scheme

Python Scripts

Dependencies

Conda Environment

Usage

Application

Web Interface

Interface Preview

Home Page

Gene Page - Panel 1

Gene Page - Panel 2

About

Releases

Packages

Contributors 4

License

wassermanlab/gene2wordclouds

Folders and files

Latest commit

History

Repository files navigation

Gene2WordClouds

Table of Contents

Pipeline

Scheme

Python Scripts

Dependencies

Conda Environment

Usage

Application

Web Interface

Interface Preview

Home Page

Gene Page - Panel 1

Gene Page - Panel 2

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages