PubMed is an online database of biomedical literature from MEDLINE, life science journals, and online books. It contains over 35 million citations, covering various areas of research related to biomedicine and health since 1965.
Our work aims to offer easy to access insights into hot research areas, establish structure and organize the vast amount of information available in PubMed.
We developed pmtrendviz
, a python based text-analytics tool that makes use of document embedding and clustering methods to identify research areas without supervision and derive trends on a per-cluster basis for a number of clusters most similar to a given query.
- Python 3.10
- Docker 20.10.20
- Node 19.4.0 (see the instructions)
- Git LFS (optional, for installing pre-trained
pmtrendviz
models)- On WSL2, you may need to install
git-lfs
manually, see this thread
- On WSL2, you may need to install
- 8GB RAM
- 10GB free disk space
- 32GB RAM
- 70GB free disk space
- GPU (optional)
git clone https://github.com/psaegert/pmtrendviz.git
cd pmtrendviz
conda create -n pmtrendviz python=3.10
conda activate pmtrendviz
pyenv install 3.10
pyenv local 3.10
python -m venv .venv
Option 1 (recommended):
Install the entire package with pip:
pip install -e .
Option 2:
If you do not wish to install the package and run the main.py
script directly, use the following command to install the dependencies:
pip install -r requirements.txt
docker compose up -d es01 [elasticvue]
Note: The es01
service is required for all steps of the pipeline.
The pmtrendviz
pipeline consists of four distinct steps: Data collection, training, prediction, and visualization, which can be run in the following ways:
Option 1: CLI (recommended)
Check out the CLI Documentation or the minimal CLI example
Option 2: Use pmtrendviz
in your own python code
Check out the minimal python example
To start the visualization, run the start_backend.sh
and start_frontend.sh
scripts in two separate terminals.
Afterwards, open http://localhost:5173/
in your browser, and start typing in the search bar (be patient, it may take a while for the models to load into memory).
To set up the development environment, run the following command:
pip install -r requirements_dev.txt
We use
- flake8 to enforce linting
- mypy to enforce static typing
- isort to enforce import sorting
- pytest to run tests against our code (see
tests/
)
To set up linting, static typing, whitespace trailing, ordering of requirements.txt
and imports when committing, run the following command:
pre-commit install
To run the pre-commit hooks manually, run the following command:
pre-commit run --all-files
Tests can be run with the following command:
pytest
If you use this code for your own research, please cite our work:
@misc{pmtrendviz,
author = {Paul Saegert and Philipp Steichen},
title = {Unsupervised Discovery Of Trends In Biomedical Research Based On The PubMed Baseline Repository},
year = {2023},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/psaegert/pmtrendviz}},
}