This repo contains code for the paper [Energy-Based generative models for monoclonal antibodies]. Some of the code was adapted from the repository [GflowNet for Biological Sequence Design] (https://github.com/MJ10/BioSeq-GFN-AL) See Licence for more information
The code has been tested with Python 3.7 with CUDA 11.3 and CUDNN 8.0.
- We recommand setting up an anaconda environment before running the code
- Before installing the requirements, ensure you have a c++ compiler available on your machine (apt-get install build-essential on ubuntu)
- Install the dependencies (pip install -r requirements.txt)
- Install anarci (conda install bioconda::anarci)
- Run the script download_data_and_embeddings to download the necessary ESM embeddings of AAYL49 and the sequences generated by our sampling methods
mcmc_covid.py
, mcmc_true_aff.py
, and mcmc_true_aff_hard.py
are the entry points for the generation of sequences using MCMC.
run_covid.py
, run_true_aff.py
, and run_true_aff_hard.py
are the entry points for the generation of sequences using GFlowNet.
antBO_simple.py
, antBO_hard.py
are the entry points for the generation of sequences using antBO.
Please reach out to Paul Pereira, [email protected] for any issues, comments, questions or suggestions.