[Feature request]: Refactor MaterialsProjectDataset
to not serialize pymatgen
Structures
in LMDB
#267
Labels
code maintenance
Issue/PR for refactors, code clean up, etc.
data
Issues related to data loading, pipelining, etc.
enhancement
New feature or request
Feature/behavior summary
Currently, the workflow implemented for
MaterialsProjectDataset
will save and reload apymatgen.Structure
object. The issue with this is that it is very intimately tied to the version ofpymatgen
, where small API changes can make it difficult to reload the dataset in later versions.Request attributes
Related issues
No response
Solution description
If we can refactor it so that
Structure
s are created at load time - in line with other dataset implementations - it would make it break this dependency...breaking.We would have to re-process the existing LMDBs being distributed, and make sure that the data is stored as just plain coordinates, atoms, and lattice parameters.
Additional notes
No response
The text was updated successfully, but these errors were encountered: