GitHub - igem-toronto/plasmidai: Plasmid.ai is the largest open-source toolkit for developing plasmid foundation models.

Plasmid.ai is the largest open-source toolkit for developing plasmid foundation models. Created by the iGEM Toronto team, this project aims to revolutionize the field of synthetic biology by leveraging machine learning to generate novel plasmids.

Achievements - iGEM Jamboree 2024

Best Model across 400+ teams from 50+ countries
Best project in North America
Top 10 globally in overgrad category

Overview

Plasmid.ai provides a comprehensive set of tools and models for the analysis, design, and generation of plasmids. By utilizing state-of-the-art machine learning techniques, this project enables researchers and synthetic biologists to explore new possibilities in plasmid engineering and design. For more information about our team and project, visit our iGEM Team Wiki.

Features

Plasmid Sequence Tokenization: Utilizes custom tokenizers tailored for encoding plasmid sequences.
Data Preprocessing Pipelines: Includes robust modules for loading, preprocessing, and visualizing plasmid data.
Advanced Sampling Techniques: Provides cutting-edge sampling functions for generating novel plasmids based on trained models.
Lightning Integration: Seamlessly integrates with PyTorch Lightning for distributed training and model scalability.
Custom Model Components: Features specialized optimizers and callbacks for enhanced model performance.

Installation

Using pip

To install the Plasmid.ai package, run the following command:

pip install --upgrade pip setuptools wheel
pip install plasmidai

Using git

For development or to access the latest features, you can clone the repository:

git clone https://github.com/igem-toronto/plasmidai.git
cd plasmidai
pip install --upgrade pip setuptools wheel
pip install -e .

You can use conda or poetry to manage dependencies.

Usage

Here's a basic example of how to use Plasmid.ai:

import plasmidai as pai

# Training
python -m pai.experimental.train \
    --backend.matmul_precision=medium \
    --data.batch_size=64 --data.num_workers=4 \
    --lit.fused_add_norm=true --lit.scheduler_span=50000 --lit.top_p=0.9 \
    --trainer.accelerator=gpu  --trainer.devices=2 --trainer.precision=bf16-mixed \
    --trainer.wandb=true --trainer.wandb_dir="${REPO_ROOT}/logs" \
    --trainer.checkpoint=true --trainer.checkpoint_dir="${REPO_ROOT}/checkpoints" \
    --trainer.progress_bar=true \
    --trainer.max_epochs=175

# Generation
python -m pai.experimental.sample \
    --backend.matmul_precision=medium \
    --sample.checkpoint_path="${REPO_ROOT}/checkpoints/last.ckpt" \
    --sample.precision=bfloat16 --sample.num_samples=10000 --sample.top_p=0.9 \
    --sample.wandb_dir="${REPO_ROOT}/logs"

Checkout the slurm directory for more examples!

Project Structure

The Plasmid.ai project is organized into several key components:

data/: Contains datasets and scripts for data processing.
- scripts/: Helper scripts for data manipulation.
- tokenizers/: Custom tokenizers for plasmid sequences.
datasets/: Modules for loading and preprocessing plasmid datasets.
experimental/: Cutting-edge features and models in development.
- callbacks.py: Custom callbacks for model training.
- lit.py: Lightning modules for PyTorch Lightning integration.
- optimizers.py: Custom optimizers for training plasmid models.
- sample.py: Functions for sampling from trained models.
- train.py: Training pipelines for plasmid models.
utils.py: Utility functions used across the project.
paths.py: Path configurations for the project.

Authors and acknowledgment

This project is developed by the iGEM Toronto 2024 team. We would like to extend our gratitude to all the team members and contributors who have made this project possible. Special thanks to our mentors and collaborators for their guidance and support.

Contributing

We welcome contributions from the community! Please open an issue first.

License

We use the Apache-2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
diffusion		diffusion
evo		evo
plasmidai		plasmidai
slurm		slurm
valid		valid
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Achievements - iGEM Jamboree 2024

Table of Contents

Overview

Features

Installation

Using pip

Using git

Usage

Project Structure

Authors and acknowledgment

Contributing

License

About

Releases 1

Packages

Contributors 8

Languages

License

igem-toronto/plasmidai

Folders and files

Latest commit

History

Repository files navigation

Achievements - iGEM Jamboree 2024

Table of Contents

Overview

Features

Installation

Using pip

Using git

Usage

Project Structure

Authors and acknowledgment

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 8

Languages

Packages