Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD simulation extension #218

Open
wants to merge 4 commits into
base: upcoming-2.0.0
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions EXT_MD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
Domain-Specific Naming Conventions for Molecular Dynamics Simulation Codes
==========================================================================

openPMD extension name: `MD`

VERSION: 0.0.3 (January 26th, 2020)

Introduction
------------

This extension is specifically designed for molecular dynamics simulation codes.

The current version of this extension is suitable to allow the output of
arbitrary simulation codes to be post-processed and compared with common
tools and frameworks. Future versions will define a common set of required
records and further attributes.

The example data structure can be found [HERE](https://github.com/ejcjason/MDDomainExtension).

Root Group
----------

### Additional Attributes for the *Root* (`/`) Group

The following additional attributes are defined in this extension.
The individual requirement is given in `scope`.

- `forceField`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for forceField and forceFieldParameters, do you want to define the syntax of the values further or keep values as a human-readable-only free text?
You can check out the SpeciesType extension for ideas how to define syntax and alternatives.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal case is that there could be a well-defined syntax to describe the forceField, and to be read by MD simulation code directly. But in practice, for different forceFields, there will be different parameters. Like the LJ potential and EAM potential, the parameter syntax are different.

Since I cannot understand all the potentials used in MD simulation, maybe I can start with defining the syntax from some potentials I understand? If it's other potential, just make it a human-readable-only free text.

Should we also ask the opinion of some other experts about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feel free to reach out. Otherwise you can also just keep it undefined for now.

You can always add more attributes in openPMD and if you arrive at something that is worth standardizing for those attributes, then we can just specify them later.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't particularly like this constraint in the name. ;)

Consider a case where one wishes to find the ground state of an atomic configuration. This is not a forceField but just a method. Would this make sense to change?

Copy link
Member

@ax3l ax3l Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think @JunCEEE? Sounds reasonable to me, if storing the used method is the intent for this attribute :)

Note that if we use a very general attribute name like method, we need to add a better description to compensate what we mean :)

- type: 1-dimensional array of N (string) elements, where N is the number of force fields (interatomic potentials) implemented in the simulation.
- scope: *optional*
- description: the methods implemented in the simulation to describe the force fields (interatomic potentials). See [Interatomic Potentials Repository](https://www.ctcms.nist.gov/potentials/).
- example values:
- `eam/alloy`
- `lj/cut 3.0`
- ...
- `forceFieldParameters`
- type: 1-dimensional array of N (string) elements, where N is the number of force fields (interatomic potentials) implemented in the simulation.
- scope: *optional*
- description: the parameters specification for the `forceField` methods. See [Interatomic Potentials Repository](https://www.ctcms.nist.gov/potentials/).
- example values:
- `pair_coeff * * 1 1`
- `pair_coeff 1 1 Cu_mishin1.eam.alloy Cu`
- ...

Observable Records
------------------

`observables` is an *optional* group that contains physical observables that are derived from the system state, i.e., thermodynamic information (e.g. temperature, energy, pressure).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: Where is this group located? in the root (/) path as well? Or the basePath? Or in meshesPath/particlesPath?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JunCEEE let's clarify this as well


### Naming conventions

The naming conventions conforming the naming conventions for [Scalar, Vector and Tensor Records](STANDARD.md#naming-conventions) in [openPMD base standard](STANDARD.md).

- `scalar` record
- type: *(any type)*
- data set: `recordName` unique name in group `basePath` + `observables`
- examples:
- /data/observables/temperature
- /data/observables/pressure

### Attributes for each `observable` record

The attributes of unit system for records should be included as defined in [openPMD base standard](STANDARD.md#unit-systems-and-dimensionality).


Particle Records
----------------

### Additional attributes for each `position` record

The following attributes are defined in this extension. The individual requirement is given in `scope`.

- `positionFormat`
- type: *(string)*
- scope: *required*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, another ping here ;)

Since generally position is considered absolute wouldn't it make sense to make this optional, thus defaulting to absolute?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a good idea for a program that writes in this format. However, as a data format, I feel it's better to make the positionFormat clear in metadata to ensure the data is self-described. Thus I tend to keep it required.

What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the OpenPMD community has of rules for these somewhat optional things. I don't mind having it required, it is just weird given that the OpenPMD specification without the MD simulation extension does not have this. If anything this format shouldn't belong to the MD extension.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I understand correctly, but there is indeed the scope with optional in the "PIC extension" which I take as a template. And there are actually some records belonging to the topic of MD simulation but not essential (e.g. observables).

Maybe @ax3l can comment on this?

Copy link
Member

@ax3l ax3l Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking about this!

What I understand is that this attribute tries to modify how we interpret position records. While we generally can do such modifications in extensions, they might be a bit tricky to keep neatly compatible with the base standard.

For the specific task here, it looks to me like you try to achieve a "fine" and "coarse" position, similar to what we do in some particle-in-cell codes. For this purpose, the base standard has already he "position" and "positionOffset" records defined. Could we potentially use exactly that mechanism here or does it differ?

As an example, in PIConGPU, we defined positionOffset to a cell index, scaled via unitSI back to m. Then we use position with values between [0,1) to store the fine-grained position in a cell. The unitSI for position can be a different value than the one for positionOffset.

For data sets that do not need this splitting in fine and coarse positions, we define positionOffset to zero values (constant record).

- description: the format of the stored position coordinates
- available values:
- `absolute` the unscaled coordinates
- `fractional` the coordinates that are scaled in the range of [0,1] relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fractional, do you intentionally include the upper value? (E.g. [0.0, 1.0] vs. [0.0, 1.0)) Just double-checking.
Formatting suggestion:

Suggested change
- `fractional` the coordinates that are scaled in the range of [0,1] relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0`
- `fractional` the coordinates that are scaled in the range of `[0.0, 1.0]` relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think it deeply. But intuitively, for the atom located at the edge corner of the simulation box, it's fractional coordinate is [1.0,1.0,1.0]. Thus I set the range to be [0.0,1.0].

Please correct me if there are some occasions I missed.

Copy link
Member

@ax3l ax3l Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am just wondering about double-counting @JunCEEE - when a particle belongs to one box or another. But maybe I misunderstood and there is only one (simulation) box here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @JunCEEE here. It may be valuable to have an atom at the box-edge on one side and not the other. I guess the code should it-self check for mirrored positions. Especially considering that not all directions need to be periodic [0 ; 1] seems like the correct choice here.


### Additional Sub-Group for each Particle Species

`box` is an *optional* sub-group for each particle species to contain the information of the simulation box.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the box optional for each particle species? I mean the box is globally defined for the entire simulation box. Not by each particle specie?


### Attributes for Sub-Group `box`

The following attributes are defined in this extension.
If the `box` sub-group exists, the following attributes within it are required.
The individual requirement is given in `scope`.

- `dimension`
- type: *(uint32)*
- scope: *required*, if `box` sub-group exists
- description: the spacial dimension **D** of the simulation box.
- example values:
- `2` 2D simulation box
- `3` 3D simulation box
- ...

- `boundary`
- type: array of *(string)* containing **D** elements, where **D** is the value of `dimension`.
- scope: *required*, if `box` sub-group exists
- description: the boundary condition of the box along each dimension.
- Allowed values
- `none`
- `periodic`
- `dirichlet`
- `neumann`
- example values:
- `["periodic","periodic","periodic"]` periodic in all the three dimensions
- `["none","periodic","periodic"]` periodic in only the second and third dimensions

### Records for Sub-Group `box`

The following records are defined in this extension.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to store these small data sets as records or as attributes? Both is fine, just be aware that attributes can only be scalars or 1D-arrays, due to limitations in some file formats.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would like to use records here.

If the `box` sub-group exists, the following records within it are required.
The individual requirement is given in `scope`.
- `edge`
- type: DxD array of *(floatX)*, where **D** is the value of `dimension`.
- scope: *required*, if `box` sub-group exists
- description: the edge direction vector of the simulation box in each dimension.
- example values:
- `[[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]]` 3D orthorhombic simulation box, Ax = (1.,0.,0.), Ay = (0.,1.,0.), Az = (0.,0.,1.)
- `[[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]]` 3D triclinic simulation box, Ax = (3.46,0.,0.), Ay = (1.73,2.997,0.), Az = (0.,0.,10.)
- `[[1.,0.],[0.,1.]]` 2D rectangle simulation box, Ax = (1.,0.), Ay = (0.,1.)
- `limit`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the difference between limit and edge they are both required?
Why is limit needed? Could you make an example?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edge is a collection of the edge direction vector of each dimension of the box. It basically defines the direction of each edge/dimension of the box. edge=[[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]] means a 3D orthorhombic (instead of cubic, my fault.) simulation box. edge= [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]] means a 3D triclinic simulation box.

limit defines the starting and ending point on each edge of the box. It basically defines the size and position of the box.

Maybe, still, the naming is a bit confusing. It would be nice if you could suggest something better.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand! ;) You write exactly what the document says.
Why isn't edge fully explaining the box? What does limit tell you that edge doesn't?
Could you make a picture (2D I guess would suffice)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. what is the relation between edge and limit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking. Actually, I just realized the problem of this definition.

The idea was inspired by xhi and xlo in https://docs.lammps.org/create_box.html. However, I didn't really implement the iead clearly in this definition. What I really need (as shown in the graph) is actually edge for length and direction and origin instead of limit for the origin of the box in a cartesian coordinate. Please let me know how you think if we implement it in this way.
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think we should replace edge with vectors.

In addition to that, I'd also introduce a lengths keyword to define the length of box edges to ease the defining and changing of box sizes (vectors will only represent the direction.). Considering a box with vectors = [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]], it's not straightforward to change the box size to 5x5x5 using only vectors representing both directions and lengths. With the lengths keyword, this will be more straightforward and practical.

It will be like this:
https://github.com/JunCEEE/openPMD-standard/blob/85369d1dd4119bd27413de90a64caccaac604520/EXT_MD.md?plain=1#L125-L139

What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think length should be there at all. The vectors should contain the length of the vectors.
I don't understand your:

Considering a box with vectors = [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]], it's not straightforward to change the box size to 5x5x5 using only vectors representing both directions and lengths. With the lengths keyword, this will be more straightforward and practical.

I really have no clue what you mean here? Do you just want each vector to have a length of 5?

One parameter is more than enough to cover all cases.

My problem with your proposal is that it requires users to do this: 1) read vectors, 2) normalize vectors, 3) read lengths, 4) scale vectors
With only 1 parameter you just read it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. lengths is intended to help users change the box size easily: only change the lengths without re-calculate the vectors, and sizes are changed more frequently than vector directions. But surely your comment also makes sense. The implementation about being user-friendly can also be done on the user interface level. If we follow the principle of storing essential information, we should set only one parameter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think this should be a data format. Not a convenience format for simulators. :) So keeping things simple is much more important to me ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you arrived at a solution :)

- type: Dx2 array of *(floatX)*, where **D** is the value of `dimension`.
- scope: *required*, if `box` sub-group exists
- description: the starting and the ending of each edge vector.
- example values:
- `[[0.,300.],[0.,150.],[0.,180.]]` A 3D box example: xlo = 0, xhi = 300, ylo = 0, yhi = 150, zlo = 0, zhi = 180
- `[[0.,300.],[15.,280.]]` A 2D box example: xlo = 0, xhi = 300, ylo = 15, yhi = 280
- `unitSI`
- type: *(float64)*
- scope: *required*, if `box` sub-group exists
- description: unit-conversion factor to convert simulation unit to SI units
- example: `1.0e-10`