Medical Research Data Fetcher

A Python tool for fetching and processing medical research papers from PubMed and other academic sources. This tool provides a way to search, download, and organize scientific articles along with their metadata.

Features

Advanced search capabilities using PubMed's E-utilities
PDF download support with automatic retry mechanism
Comprehensive metadata extraction (authors, references, MeSH terms, etc.)
Built-in rate limiting to respect API guidelines

Prerequisites

Python 3.8 or higher
A registered email address for PubMed E-utilities

Installation

Clone the repository:

git clone https://github.com/qiqt/medical-research-fetcher.git
cd medical-research-fetcher

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configuration

Create environment configuration:

cp .env.example .env

Edit .env with your settings:

[email protected]
PUBMED_TOOL=YourToolName
STORAGE_PATH=./data
PDF_STORAGE_PATH=./data/pdfs
REQUEST_DELAY=0.34
MAX_RETRIES=3

Required settings:

PUBMED_EMAIL: Your email address (required by PubMed)
PUBMED_TOOL: Name of your tool/application

Optional settings:

STORAGE_PATH: Base path for storing data (default: ./data)
PDF_STORAGE_PATH: Path for PDF storage (default: ./data/pdfs)
REQUEST_DELAY: Delay between API requests in seconds (default: 0.34)
MAX_RETRIES: Maximum retry attempts for failed requests (default: 3)

Usage

Basic Example

from medical_research_fetcher import ArticleProcessor, load_config
from medical_research_fetcher.fetchers.pubmed import PubMedClient
from medical_research_fetcher.storage import LocalStorage

async def main():
    config = load_config()
    
    client = PubMedClient(config.get_pubmed_config())
    storage = LocalStorage(config.storage_path)
    processor = ArticleProcessor(client, storage)
    
    results = await processor.search_and_process(
        query="cancer immunotherapy",
        max_results=10
    )
    
    print(f"Found {results['total_articles_found']} articles")
    print(f"Successfully processed: {results['successfully_processed']}")

Directory Structure

After running, the tool creates the following directory structure:

data/
├── pdfs/                 # Downloaded PDF files
├── metadata/
│   ├── xml/             # Article XML metadata
│   ├── summary/         # Article summaries
│   └── searches/        # Search results and summaries

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
env.example		env.example
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Research Data Fetcher

Features

Prerequisites

Installation

Configuration

Usage

Basic Example

Directory Structure

License

About

Releases

Packages

Languages

License

qiqt/medical-research-fetcher

Folders and files

Latest commit

History

Repository files navigation

Medical Research Data Fetcher

Features

Prerequisites

Installation

Configuration

Usage

Basic Example

Directory Structure

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages