Installation assumes cuda version 12.0
mamba env create -f chemreasoner.yml
conda activate chemreasoner
git clone https://github.com/pnnl/chemreasoner.git
cd chemreasoner
git submodule update --init --recursive
cd ext/ocp/
pip install -e .
cd ../Open-Catalyst-Dataset
pip install -e .
cd ../..
To test the installation:
python src/scripts/test_gnn.py # use --cpu to test on cpu only
The code to reproduce the ICML results is located in src/scripts/run_icml_queries.py
. An example run script has been provided in src/launch_scripts/run_icml.sh
. You will need to set a few parameters...
- savedir: The directory to save the results in
- start-query: The index of the first query to evaluate (see data/input_data/dataset.csv)
- start-query: The index of the final query to evaluate
- gnn-traj-dir: The directroy in which to store relaxation trajectories
- dotenv-path: The path to .env file containing api keys for your azure openai setup (see instructions below)
The .env file should be located in the chemreasoner root directory and contain the api keys and info for your Azure OpenAI interface, which can be found on the Azure portal.
AZURE_OPENAI_DEPLOYMENT_NAME=<deployment name>
AZURE_OPENAI_ENDPOINT=<url to deployment endpoint>
AZURE_OPENAI_API_KEY=<api key>
AZURE_OPENAI_API_VERSION="2023-07-01-preview"
To run relaxations with the GNN model, you will have to set up a redis server. To do so open a new terminal on the same machine that you will be running chemreasoner on (with access to a GPU). Then run,
redis-server --dir <directory to store redis server cache>
Here, --dir
can be set to any directory.
Once you have set up the run script, the .env file, and started the local redis server, run the ICML code by entering
./src/launch_scripts/run_icml.sh
- ICML 2024: "Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback" arXiv
- Presentation at MLCommons Science Working Group
- We will have two presentations at upcoming American Chemical Society Spring 2024 National Meeting!
- Sprueill H.W., C. Edwards, M.V. Olarte, U. Sanyal, H. Ji, and S. Choudhury. "Integrating generative AI with computational chemistry for catalyst design in biofuel/bioproduct applications." American Chemical Society Spring 2024 National Meeting, New Orleans, Louisiana (oral presentation).
- Sprueill H.W., C. Edwards, M.V. Olarte, U. Sanyal, K. Agarwal, H. Ji, and S. Choudhury. 03/18/2024. "Extreme-Scale Heterogeneous Inference with Large Language Models and Atomistic Graph Neural Networks for Catalyst Discovery." American Chemical Society Spring 2024 National Meeting, New Orleans, Louisiana (poster).
- Our work on Monte Carlo Thought Search is accepted for publication in EMNLP 2023 Findings (arXiv)
- Excited to present "ChemReasoner: Large Language Model-driven Search over Chemical Spaces with Quantum Chemistry-guided Feedback" at 2023 Stanford Graph Learning Workshop
- We are thrilled to be selected for the Microsoft Accelerate Foundation Models Research Initiative
- Presentation at AI Hardware and Edge AI Summit, Santa Clara, September 2023
Please cite the following papers [https://arxiv.org/abs/2310.14420] [https://arxiv.org/abs/2402.10980] if you find our work useful.
@inproceedings{sprueill2023MCR,
title={Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design},
author={Sprueill, Henry W. and Edwards, Carl and Sanyal, Udishnu and Olarte, Mariefel and Ji, Heng and Choudhury, Sutanay}
booktitle={In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP2023) Findings},
year={2023}
}
@article{sprueill2024chemreasoner,
title={CHEMREASONER: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback},
author={Sprueill, Henry W and Edwards, Carl and Agarwal, Khushbu and Olarte, Mariefel V and Sanyal, Udishnu and Johnston, Conrad and Liu, Hongbin and Ji, Heng and Choudhury, Sutanay},
journal={arXiv preprint arXiv:2402.10980},
year={2024}
}
Sutanay Choudhury sutanay tod choudhury ta pnnl tod gov