Skip to content

Latest commit

 

History

History
921 lines (764 loc) · 39.8 KB

mongo.org

File metadata and controls

921 lines (764 loc) · 39.8 KB

Examples of a Mongo database for ase calculations

Introduction

This document describes a MongoDB for vaspy calculations. Each calculation is saved as a document in the database with the atomic geometry, calculation parameters, and calculation results. Some data is stored redundantly to facilitate queries.

Here is some typical data stored.

from vasp.mongo import MongoDatabase
db = MongoDatabase()

import pprint
pprint.pprint(next(db.find({'calculator.class': 'Vasp'}, limit=1)))

It is easy to write atoms with arbitrary key-value pairs. You have to do the work to decide if writing would add a duplicate entry.

Here is an example of adding an entry to the database.

from ase import Atoms
from ase.calculators.emt import EMT
h2 = Atoms('H2', [(0, 0, 0), (0, 0, 0.7)])
h2.calc = EMT()
print(h2.get_forces())

from vasp.mongo import MongoDatabase, mongo_doc
db = MongoDatabase()

doc = mongo_doc(h2)

print(db.write(doc, relaxed=False))

You can query the database by an id:

from vasp.mongo import MongoDatabase
from bson import ObjectId

db = MongoDatabase()
c = db.get_atoms({'_id': ObjectId('58c0b7d2340e3b29a6c6d4d2')})

a = next(c)

print(a)
print(a.get_potential_energy())
print(a.get_forces())

It isn’t that easy to search by id unless you know it. Here we query a different way, using some parameters and calculator type.

from vasp.mongo import MongoDatabase

db = MongoDatabase()

hits = db.find({'calculator.class': 'EMT',
                'atoms.symbol_counts.H': 2,
                'atoms.natoms': 2,
                'relaxed': False,})

for hit in hits: print(hit)

We can add any kind of calculator.

from ase.calculators.singlepoint import SinglePointCalculator
from ase import Atoms

h2 = Atoms('H2', [(0, 0, 0), (0, 0, 0.7)])

calc = SinglePointCalculator(energy=0.0, atoms=h2)
h2.set_calculator(calc)

from vasp.mongo import MongoDatabase, mongo_doc

db = MongoDatabase()
print(db.write(mongo_doc(h2)))
from vasp.mongo import MongoDatabase
from bson import ObjectId

db = MongoDatabase()
c = db.find({'_id': ObjectId('58c0b835340e3b2a2afb4999')})

print(next(c))

Assumptions

We assume a Mongo server is running on localhost at port 27017, with an “ase” database and “atoms” collection by default. You can set all of these with args to MongoDatabase(). The server does not run automatically right now, it has to be started after reboots. There is currently no security on the database.

Then, you create atoms, and write them to the database. You can write arbitrary (anything that can be serialized to json) key-value pairs to the database.

Start here: https://docs.mongodb.com/manual/

Query intro: https://docs.mongodb.com/manual/crud/#read-operations

Example queries

path tags

Find by pathtags, which is just the path split by directories. The order is not important.

The MongoDatabase initializer returns the database object. There is a db.collection attribute that is the actual collection you want to work on. There are a few thing wrappers for functions like find and count.

The find function returns a pymongo cursor, which is a generator that returns documents. The documents are basically Python dictionaries.

import pprint
from vasp.mongo import MongoDatabase
db = MongoDatabase()

c = db.find({'calculator.pathtags': {'$all': ['O2-sp-triplet', 'molecules']}})
print(c.count())
pprint.pprint(next(c))

formula

By formula, Say NH3. We query by type and number, and we specify natoms too, to prevent getting slabs with adsorbates of this composition.

from pprint import pprint
from vasp.mongo import MongoDatabase
db = MongoDatabase()

c = db.find({'atoms.symbol_counts.N': 1,
             'atoms.symbol_counts.H': 3,
             'atoms.natoms': 4})
print(c.count())
pprint(next(c))

Here we find calculations containing N and H.

from pprint import pprint
from vasp.mongo import MongoDatabase
db = MongoDatabase()

c = db.find({'atoms.chemical_symbols': {'$all': ['N', 'H']}})
print(c.count())

By a calc parameter

You can use dot notation to search for fields in subdocuments.

import numpy as np
from vasp.mongo import MongoDatabase

db = MongoDatabase()
c = db.find({'calculator.parameters.hfscreen': 0.2})
print(c.count())

# find special setups
c = db.find({'calculator.parameters.setups': {'$exists': True}})
print(c.count())
for doc in c: print(doc['calculator']['parameters']['setups'])

# An neb
c = db.find({'calculator.parameters.images': {'$exists': True}})
print(c.count())

An equation of state

Here we filter by spacegroup to get a set of calculations we could use for an equation of state of fcc Cu. We match on a regular expression on the spacegroup since it is stored as a string with the number in parentheses.

import numpy as np
from vasp.mongo import MongoDatabase
db = MongoDatabase()

eos = db.find({'atoms.symbol_counts.Cu': 1, 'atoms.natoms': 1,
               'atoms.spacegroup': {'$regex': '(225)'},
               'calculator.parameters.kpts': [8, 8, 8],
               'calculator.parameters.encut': 350},
              projection={'_id': 0, # do not show id
                          'calculator.pathtags': 1,
                          'calculator.energy': 1,
                          'atoms.volume': 1})

print(eos.count())
for c in eos: print c

Check a calculator

This shows we can rebuild a calculator from the database.

from vasp.mongo import MongoDatabase
from vasp import Vasp

db = MongoDatabase()
c = next(db.find({'atoms.symbol_counts.O': 1}))

calc = Vasp(c['calculator']['path'], c['calculator']['parameters'])
print(calc)

Vasp calculation directory:


/home-research/jkitchin/dft-book/blog/source/org/molecules/co-1.05

Unit cell:


x y z |v| v0 6.000 0.000 0.000 6.000 Ang v1 0.000 6.000 0.000 6.000 Ang v2 0.000 0.000 6.000 6.000 Ang alpha, beta, gamma (deg): 90.0 90.0 90.0 Total volume: 216.000 Ang^3 Stress: xx yy zz yz xz xy -0.060 0.011 0.011 -0.000 -0.000 -0.000 GPa

ID tag sym x y z rmsF (eV/A) 0 0 C 0.000 0.000 0.000 14.93 1 0 O 1.050 0.000 0.000 14.93 Potential energy: -14.2158 eV

INPUT Parameters:


lcharg : False pp : PBE nbands : 6 xc : pbe ismear : 1 lwave : False sigma : 0.01 kpts : [1, 1, 1] encut : 350

Pseudopotentials used:


C: potpaw_PBE/C/POTCAR (git-hash: ee4d8576584f8e9f32e90853a0cbf9d4a9297330) O: potpaw_PBE/O/POTCAR (git-hash: 592f34096943a6f30db8749d13efca516d75ec55)

A special setup calculator

from vasp.mongo import MongoDatabase
from vasp import Vasp

db = MongoDatabase()
atoms = next(db.get_atoms({'calculator.path': '/home-research/jkitchin/dft-book/molecules/O_s'}))
calc = atoms.get_calculator()
print(calc)

Vasp calculation directory:


/home-research/jkitchin/dft-book/molecules/O_s

Unit cell:


x y z |v| v0 6.000 0.000 0.000 6.000 Ang v1 0.000 6.000 0.000 6.000 Ang v2 0.000 0.000 6.000 6.000 Ang alpha, beta, gamma (deg): 90.0 90.0 90.0 Total volume: 216.000 Ang^3 Stress: xx yy zz yz xz xy 0.001 0.001 0.001 -0.000 -0.000 -0.000 GPa

ID tag sym x y z rmsF (eV/A) 0 0 O 5.000 5.000 5.000 0.00 Potential energy: -1.5056 eV

INPUT Parameters:


magmom : [1.0] pp : PBE setups : ‘O’, ‘_s’ kpts : [1, 1, 1] encut : 300 lcharg : False xc : pbe ispin : 2 ismear : 0 lwave : False sigma : 0.001 lorbit : 11

Pseudopotentials used:


O: potpaw_PBE/O_s/POTCAR (git-hash: b4bfc67547c457885a1cc949eeda825354a6520a)

calc with rwigs

from vasp.mongo import MongoDatabase
from vasp import Vasp

db = MongoDatabase()
atoms = next(db.get_atoms({'calculator.path': '/home-research/jkitchin/dft-book/molecules/co-ados'}))
calc = atoms.get_calculator()
print(calc)

Vasp calculation directory:


/home-research/jkitchin/dft-book/molecules/co-ados

Unit cell:


x y z |v| v0 6.000 0.000 0.000 6.000 Ang v1 0.000 6.000 0.000 6.000 Ang v2 0.000 0.000 6.000 6.000 Ang alpha, beta, gamma (deg): 90.0 90.0 90.0 Total volume: 216.000 Ang^3 Stress: xx yy zz yz xz xy 0.060 0.027 0.027 -0.000 -0.000 -0.000 GPa

ID tag sym x y z rmsF (eV/A) 0 0 C 0.000 0.000 0.000 5.14 1 0 O 1.200 0.000 0.000 5.14 Potential energy: -14.7178 eV

INPUT Parameters:


lcharg : False pp : PBE kpts : [1, 1, 1] xc : pbe ismear : 1 lwave : False sigma : 0.1 rwigs : {‘C’: 1.0, ‘O’: 1.0} encut : 300

Pseudopotentials used:


C: potpaw_PBE/C/POTCAR (git-hash: ee4d8576584f8e9f32e90853a0cbf9d4a9297330) O: potpaw_PBE/O/POTCAR (git-hash: 592f34096943a6f30db8749d13efca516d75ec55)

By a bond length

By C-O bond-length, say we want C-O bond lengths less than 1.2 angstroms. This would not be an easy query to do in the database. Instead we get all documents that match at least one C and one O, and use python externally to filter the matches.

import numpy as np
from vasp.mongo import MongoDatabase
db = MongoDatabase()

all_atoms = db.get_atoms({'atoms.symbol_counts.C': {'$gte': 1},
                          'atoms.symbol_counts.O': {'$gte': 1}})

def bond_length_filter(atoms, bond_length=1.2):
    "Return True if there is a C-O bond less than bond_length in atoms."
    C = [atom for atom in atoms if atom.symbol == 'C']
    O = [atom for atom in atoms if atom.symbol == 'O']
    for catom in C:
        for oatom in O:
            d = np.sqrt(sum(catom.position - oatom.position)**2)
            if d <= bond_length:
                return d

A = [atoms for atoms in all_atoms if bond_length_filter(atoms)]
print(len(A))

List all the pathtags

Here we have to use the db.collection to access the distinct command. You can always use this, it is just a little longer.

import numpy as np
from vasp.mongo import MongoDatabase

db = MongoDatabase()
c = db.collection.distinct('calculator.pathtags', {})
print(c)

update a record

Mongo provides update and findAndModify functions. Here is an example with update. Note, that it is possible to update many documents at a time, here we query by id to avoid that.

from vasp.mongo import MongoDatabase
from bson.objectid import ObjectId

db = MongoDatabase()

db.collection.update({'calculator.path': '/home-research/jkitchin/dft-book/molecules/nh3-initial'},
                     {'$set': {'special_tags': ['initial-state']}})

# this is how to add a tag to the tags array
db.collection.update({'calculator.path': '/home-research/jkitchin/dft-book/molecules/nh3-initial'},
                     {'$addToSet': {'special_tags': {'$each': ['neb', 'initial-state']}}})

c = db.find({'calculator.path': '/home-research/jkitchin/dft-book/molecules/nh3-initial'},
            projection={'special_tags': 1})

import pprint
pprint.pprint(next(c))

Find calculations with FixAtoms constraints

from vasp.mongo import MongoDatabase
from bson.objectid import ObjectId

db = MongoDatabase()



c = db.find({'atoms.constraints.name': 'FixAtoms'})

print(c.count())

An example to walk directory and add calculations.

This just defines a function that usually recognizes a Vasp directory (it fails on NEB directories), and if the directory is not in the database, it adds it.

import os
from vasp import *

from vasp.vasprc import VASPRC
VASPRC['mode'] = None

def vasp_p(directory):
    'returns True if a finished OUTCAR file exists in the current directory, else False'
    outcar = os.path.join(directory, 'OUTCAR')
    incar = os.path.join(directory, 'INCAR')
    if os.path.exists(outcar) and os.path.exists(incar):
        with open(outcar, 'r') as f:
            contents = f.read()
            if 'General timing and accounting informations for this job:' in contents:
                return True
    return False

from vasp.mongo import MongoDatabase, mongo_doc
db = MongoDatabase()

for root, dirs, files in os.walk('/home-research/jkitchin/dft-book'):
    for d in dirs:
        # compute absolute path to each directory in the current root
        absd = os.path.join(root, d)

        if (vasp_p(absd)
            # the test dir had some problems.
            and 'test' not in absd
            # Don't add things already in
            and db.find({"calculator.path": absd}).count() == 0):
            # we found a vasp directory, so we can do something in it.
            # here we add it to the ase mongdb

            calc = Vasp(absd)
            atoms = calc.get_atoms()
            db.write(mongo_doc(atoms), source="dft-book")
            print('added {}'.format(absd))

Other things you might do

derived documents

This is some idea that you could store an adsorption energy with links to the documents. Here is an example of getting an adsorption energy.

import numpy as np
from vasp.mongo import MongoDatabase

db = MongoDatabase()

clean = db.collection.find_one({'calculator.pathtags': {'$all': ['surfaces', 'Pt-slab']}})
oslab = db.collection.find_one({'calculator.pathtags': {'$all': ['surfaces', 'Pt-slab-O-fcc']}})
o2 = db.collection.find_one({'calculator.pathtags': {'$all': ['molecules', 'O2-sp-triplet-350']}})

print(clean['_id'])
print(oslab['calculator']['energy'] - clean['calculator']['energy'] - 0.5 * o2['calculator']['energy'])

As a document, you could store something like this. This is a loose thought, the pseudo-example below should also include the _id for each calculation so you know where it came from. Maybe there is some jsonic way of storing variables. Alternatively, you could store a python script to do the calculation, and its result.

{"+" : [clean_slab_energy o_slab_energy {"*": [0.5 o2_energy]}]}

Store trajectories

You can build up the document any way you want and store it.