Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReWoTes project from Anup #63

Open
wants to merge 63 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
f6f3d50
Renamed the directory from example-github-username to kumaranu and mo…
kumaranu Jul 19, 2024
f7ced95
Update content for README.md.
kumaranu Jul 19, 2024
34c629c
Added a very preliminary file to get a structure to the project. It o…
kumaranu Jul 19, 2024
8e4b07d
Added a requirements.txt file.
kumaranu Jul 19, 2024
a59b8d2
Added a test file.
kumaranu Jul 19, 2024
1a1b8ee
Added __init__.py file inside the directory to recognize the director…
kumaranu Jul 19, 2024
264180d
Trying to add a workflow for github actions.
kumaranu Jul 19, 2024
3b25c7a
Added a requirements.txt in the main directory because Github actions…
kumaranu Jul 19, 2024
52424aa
Added pytest in the requirements.txt.
kumaranu Jul 19, 2024
ac54d12
Added a molecule with 16 atoms in the tests.
kumaranu Jul 19, 2024
e56c7fd
Not sure what was that. But added the 16 atom containing molecule ove…
kumaranu Jul 19, 2024
0ef9b2c
Added ase in the requirements file.
kumaranu Jul 19, 2024
cab4b02
Changed the basis_set_provider to have molecular structure in the ase…
kumaranu Jul 19, 2024
2c08451
Added a test to check nwchem energy call.
kumaranu Jul 19, 2024
0581399
Added an NWChem install call to GitHub actions file.
kumaranu Jul 19, 2024
2590bed
Added geometries to run the tests on. These are obtained from ANI-1cc…
kumaranu Jul 20, 2024
30e448a
Testing multiple basis sets.
kumaranu Jul 21, 2024
224e632
Added newer set of geometries and removed the previous ones. Now ther…
kumaranu Jul 21, 2024
0e85f4e
Added homo lumo convergence, gibbs free calculations.
kumaranu Jul 21, 2024
17ceb55
Corrected test values.
kumaranu Jul 22, 2024
7c08bf2
Changed the tests to not do gibbs calculations temporarily.
kumaranu Jul 23, 2024
120db5d
Added pandas to save data in csv.
kumaranu Jul 24, 2024
6fa3d09
Removed the files related to the second geometry as I am only compari…
kumaranu Jul 24, 2024
33e1f0a
Changed the test to run energy calculations for a list of basis sets …
kumaranu Jul 24, 2024
7f26497
Adding first results file for NWChem with different Pople basis sets.
kumaranu Jul 24, 2024
61b0bf5
Added files for the larger molecules as well. The data was not genera…
kumaranu Jul 25, 2024
8ad1c8b
Added the test files for the cases for which the error data was gener…
kumaranu Jul 25, 2024
4b48084
Did a few things: 1) made vibrational analysis an if statement, 2) ad…
kumaranu Jul 25, 2024
ebc227e
Not sure if this is just a copy of the error file but keeping it here…
kumaranu Jul 25, 2024
5d65510
A file containing a class to load error data from a csv file. This ca…
kumaranu Jul 26, 2024
4729bb9
Added tests for basis set provider class.
kumaranu Jul 26, 2024
b479ed2
Added tests for basis sets selector class.
kumaranu Jul 26, 2024
93219fb
Added basic tests for basis_set_provider class.
kumaranu Jul 26, 2024
88908bf
Added a class for calculator related functions. More things can be ad…
kumaranu Jul 26, 2024
7f4285b
Added a class for data collection. This is to generate the error info…
kumaranu Jul 26, 2024
c814c9c
BasisSetSelector class deals with the logic used to return the final …
kumaranu Jul 26, 2024
544ab8a
BasisSetProvider class is the main class that combines everything tog…
kumaranu Jul 26, 2024
a3e4536
Added a few tests for the basisSetProvider.
kumaranu Jul 26, 2024
ce37bd4
Added tests for the calculator used here. Only checking vibrational a…
kumaranu Jul 26, 2024
1023dc6
Added tests tp collect data from csv and also generated from the geom…
kumaranu Jul 26, 2024
504d942
Created a smaller dataset to test the functions in an easier way.
kumaranu Jul 26, 2024
c5c21a5
This is csv output for the smaller case.
kumaranu Jul 26, 2024
088b2cc
Added skip marks to older file's tests.
kumaranu Jul 26, 2024
179aa2d
This was the file being skipped.
kumaranu Jul 26, 2024
becce61
Just remove the check for the key name for the second index.
kumaranu Jul 26, 2024
726ce1e
Added the csv for the three molecule case.
kumaranu Jul 26, 2024
4f5414c
Removed an unused test file.
kumaranu Jul 26, 2024
500e23f
Renamed a file.
kumaranu Jul 26, 2024
506b795
Modified the test file for basissetselector to not read csv from a file.
kumaranu Jul 26, 2024
4f8126f
Added docstrings to basis set provider class.
kumaranu Jul 26, 2024
b772469
Added docstrings to data collector class.
kumaranu Jul 26, 2024
fbeed92
Added docstings to basissetselector class.
kumaranu Jul 26, 2024
47d99f1
Added docstrings to the calculator class.
kumaranu Jul 26, 2024
c3e1079
Added version in the __init__.py for setuptools.
kumaranu Jul 26, 2024
77411e5
Added setup.py for installation purposes.
kumaranu Jul 26, 2024
c396fc0
Got rid of project_root paths.
kumaranu Jul 26, 2024
5af73bc
Again got rid of some project root path arguments.
kumaranu Jul 26, 2024
34b1194
project_root paths removed.
kumaranu Jul 26, 2024
0bd4edb
Got rid of test_dir paths.
kumaranu Jul 26, 2024
1e40e09
Just made the arguments multiple for better readbility.
kumaranu Jul 26, 2024
0a001ff
Updated README
kumaranu Jul 26, 2024
9a4afe4
Removed csv file from the args.
kumaranu Jul 26, 2024
6648355
Updated Readme.
kumaranu Jul 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/python-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# .github/workflows/python-tests.yml
name: Python package

on: [push, pull_request]

jobs:
build:

runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Install NWChem
run: |
sudo apt-get update
sudo apt-get install nwchem -y # Install NWChem

- name: Run tests
run: |
pytest kumaranu/tests
7 changes: 7 additions & 0 deletions basis_set_error_data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
geometry,STO-3G-error-percent,3-21G-error-percent,6-31G-error-percent,6-31G*-error-percent,6-31G**-error-percent
"[[0.0, 0.0, 0.559859335422516], [0.0, 0.0, -0.559859335422516]]",-1.2652515155542783,-0.4917908825196436,-0.03702286490532115,-0.08503021510053684,-0.08503021510053684
"[[-0.214321196079254, -0.084094531834126, 0.069681838154793], [1.531570315361023, 0.790129363536835, 0.035592794418335], [-1.336195707321167, 0.05733397603035, -0.031842533499002], [1.138465881347656, -0.085862502455711, -0.028848262503743]]",-1.267831657961443,-0.4727418843765934,-0.04650275463400509,-0.08341676419349328,-0.08694225426471786
"[[0.014473460614681, 0.497637867927551, -0.0], [-0.188154995441437, 1.678797006607056, -0.0], [0.014473460614681, -0.666374921798706, 0.0]]",-1.2017316341566104,-0.44917784243684294,-0.07095387279989016,-0.10394781418823773,-0.10591267886803132
"[[0.0, -0.0, -0.68626481294632], [-0.0, 0.0, 0.514698624610901]]",-1.266834206595056,-0.475787564741829,-0.043394708420021647,-0.08409217074733753,-0.08409217074733753
"[[0.509791612625122, 0.026870004832745, -0.023425199091434], [1.010899424552917, 0.925699234008789, 0.045212235301733], [1.415517210960388, -0.839163362979889, 0.0434595271945], [-0.685645818710327, -0.030969487503171, 0.006484928540885]]",-1.2672524312671152,-0.47041342190954255,-0.052250836697970725,-0.0883306127690224,-0.09078009857240983
"[[0.0, 0.076925121247768, 0.0], [1.179724931716919, -0.162000626325607, -0.0], [-1.179724931716919, 0.104306787252426, 0.0]]",-1.2997750766051088,-0.48029585658507695,-0.03818469310548968,-0.07979115363540501,-0.07979115363540501
6 changes: 0 additions & 6 deletions example-github-username/README.md

This file was deleted.

87 changes: 87 additions & 0 deletions kumaranu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Basis set selector (Chemistry)

> Ideal candidate: scientists skilled in Density Functional Theory and proficient in python.

# Overview

The aim of this task is to create a simple python package that implements automatic basis set selection mechanism for a quantum chemistry engine.

# Requirements

1. automatically find the basis set delivering a particular precision, passed as argument (eg. within 0.01% from reference)
1. use either experimental data or higher-fidelity modeling results (eg. coupled cluster) as reference data
1. example properties to converge: HOMO-LUMO gaps, vibrational frequencies

# Expectations

- mine reference data for use during the project
- correctly find a basis set that satisfies a desired tolerance for a set of 10-100 molecules, starting from H2, as simplest, up to a 10-20-atom ones
- modular and object-oriented implementation
- commit early and often - at least once per 24 hours

# Timeline

We leave exact timing to the candidate. Must fit Within 5 days total.

# User story

As a user of this software I can start it passing:

- molecular structure
- reference datapoint
- tolerance (precision)

as parameters and get the basis set that satisfies the tolerance criterion.

# Notes

- create an account at exabyte.io and use it for the calculation purposes
- suggested modeling engine: NWCHEM or SIESTA

## Getting Started

### Clone the Repository:

```
git clone https://github.com/kumaranu/rewotes.git

cd rewotes
```
Create and Activate Conda Environment:
```
conda create -n test0 python=3.10
conda activate test0
```
Install the Package:
```
pip install -e .
```
Run the python script given below:
```
import importlib
from pathlib import Path
from ase.atoms import Atoms
from kumaranu.basisSetProvider import BasisSetProvider

# Define project_root
kumaranu_spec = importlib.util.find_spec('kumaranu')
project_root = Path(kumaranu_spec.origin).parent.parent

# Define the molecules
mol = Atoms('CO2', positions=[[0, 0, 0], [1, 1.01, 1], [-1, -1.03, -1]])
ref = Atoms('CO2', positions=[[0, 0, 0], [1.01, 1, 1], [-1, -1.01, -1]])

# Set the tolerance
tolerance = 0.5

# Create the BasisSetProvider object
basisProviderObject = BasisSetProvider(
tolerance,
files_dir=str(project_root / 'kumaranu/tests/three_molecules'),
recalculate_errors=True,
)

# Get the selected basis set
selected_basis = basisProviderObject.get_basis_set(mol, ref)
print(selected_basis)
```
4 changes: 4 additions & 0 deletions kumaranu/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# kumaranu/__init__.py

__version__ = '0.1'

87 changes: 87 additions & 0 deletions kumaranu/basisSetProvider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
from typing import List
from ase.atoms import Atoms
from kumaranu.dataCollector import DataCollector
from kumaranu.basisSetSelector import BasisSetSelector
from pathlib import Path


class BasisSetProvider:
"""
Provides the best basis set for a given molecular structure within a specified tolerance.

Parameters
----------
tolerance : float
The tolerance for selecting the basis set.
error_data_file : str, optional
The path to the CSV file containing error data for basis sets.
If not provided, a default path within the project root will be used.
files_dir : str, optional
The directory containing molecular XYZ files.
If not provided, a default directory within the project root will be used.
basis_sets : List[str], optional
A list of basis sets to be considered.
If not provided, a default list of common basis sets will be used.
recalculate_errors : bool, optional
Whether to recalculate errors and update the error data file.
Default is False.

Methods
-------
get_basis_set(molecular_structure, reference_datapoint)
Returns the best basis set for the given molecular structure within the specified tolerance.
"""
def __init__(
self,
tolerance: float,
error_data_file: str = None,
files_dir: str = None,
basis_sets: List[str] = None,
recalculate_errors: bool = False,
):
self.tolerance = tolerance
self.files_dir = files_dir if files_dir else f'{Path(__file__).resolve().parents[2]}/kumaranu/tests/molecule_xyz_files'
self.error_data_file = error_data_file if error_data_file \
else f'{files_dir}/basis_set_error_data.csv'
self.basis_sets = basis_sets if basis_sets else [
"STO-3G", "3-21G", "6-31G", "6-31G*", "6-31G**",
"6-311G", "6-311G*", "6-311G**", "6-311++G**", "6-311++G(2d,2p)",
]
self.recalculate_errors = recalculate_errors

def get_basis_set(
self,
molecular_structure: Atoms,
reference_datapoint: Atoms,
) -> str:
"""
Returns the best basis set for the given molecular structure within the specified tolerance.

Parameters
----------
molecular_structure : Atoms
The molecular structure for which to select the basis set.
reference_datapoint : Atoms
The reference molecular structure to compare against.

Returns
-------
str
The selected basis set within the specified tolerance.
"""
if self.recalculate_errors:
data_collector = DataCollector(
self.files_dir,
self.basis_sets,
)
data_collector.collect_and_store_data()
error_data, basis_sets = DataCollector.load_error_data(self.error_data_file)

selector = BasisSetSelector(
molecular_structure,
reference_datapoint,
self.tolerance,
error_data,
basis_sets,
)
return selector.select_basis_set()
79 changes: 79 additions & 0 deletions kumaranu/basisSetSelector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import numpy as np
from ase.atoms import Atoms


class BasisSetSelector:
"""
Selects the best basis set for a given molecular structure within a specified tolerance.

Parameters
----------
molecular_structure : Atoms
The molecular structure for which the basis set needs to be selected.
reference_datapoint : Atoms
The reference molecular structure used to compare and find the best basis set.
tolerance : float
The acceptable error tolerance for selecting the basis set.
error_data : dict
A dictionary containing error percentages for various basis sets.
basis_sets : list
A list of basis sets to consider.

Methods
-------
select_basis_set()
Selects the best basis set that satisfies the error tolerance.
"""
def __init__(
self,
molecular_structure: Atoms,
reference_datapoint: Atoms,
tolerance: float,
error_data: dict,
basis_sets: list,
):
self.molecular_structure = molecular_structure
self.reference_datapoint = reference_datapoint
self.tolerance = tolerance
self.error_data = error_data
self.basis_sets = basis_sets

def select_basis_set(self):
"""
Selects the best basis set that satisfies the error tolerance.

This method compares the error percentages of the available basis sets with the specified tolerance.
If a basis set meets the tolerance, it is selected. Otherwise, the basis set with the minimum error is selected.

Returns
-------
str
The name of the selected basis set.

Raises
------
ValueError
If the chemical formula of the new geometry does not match the reference.
"""
new_formula = str(self.molecular_structure.symbols)
known_formula = str(self.reference_datapoint.symbols)

if known_formula != new_formula:
raise ValueError(
f"The chemical formula for the new geometry ({new_formula}) and "
f"the reference ({known_formula}) do not match.",
)

errors = np.abs(np.array(self.error_data[str(self.reference_datapoint.symbols)]))
below_tolerance = errors <= self.tolerance

if any(below_tolerance):
selected_index = np.argmax(below_tolerance)
selected_basis = self.basis_sets[selected_index][:-14]
return selected_basis
else:
best_index = np.argmin(errors)
best_basis = self.basis_sets[best_index][:-14]
print(f"Warning: No basis set can satisfy the tolerance of {self.tolerance}. "
f"Using the best available basis set, {best_basis}.")
return best_basis
Loading