Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Id-like dataname questions? #462

Open
rowlesmr opened this issue Oct 15, 2023 · 1 comment
Open

Id-like dataname questions? #462

rowlesmr opened this issue Oct 15, 2023 · 1 comment

Comments

@rowlesmr
Copy link
Collaborator

I've been looking at category keys for various reasons, and have happened upon some questions:

  • DIFFRN_REFLN: Keyed on _diffrn_refln.hkl, which is a Matrix of hkl values. Other categories (eg DIFFRN_ORIENT_REFLN) are keyed on the three indicies individually.
    • should this be changed?
  • CHEMICAL_CONN_BOND: Keyed on .atom_1 and .atom_2, but also has id as a "Unique identifier for the bond.". The .id dataname isn't referred to anywhere else in core. The same with GEOM_ANGLE, GEOM_BOND , GEOM_CONTACT, GEOM_HBOND, GEOM_TORSION, and MODEL_SITE. Some of these are understandable, as there are many key datanames (looking at you GEOM_TORSION).
    • Should having a single, unique, non-key identifier be a policy where there exists more than one dataname in a category key?
@jamesrhester
Copy link
Contributor

All of the REFLN-type categories should be keyed by a separate id type, as we cannot guarantee that hkl are unique. This is a real problem for modulated structures (hklmnop...) and raw data (same peak collected more than once). I've been putting this off, but needs to be discussed and done.

  • chemical_conn_bond et al: the references to id are leftovers from when there were such identifiers in an earlier draft. May be deleted.

As the the more general question of when to create such "synthetic" identifiers, there is no clear-cut answer. The original DDLm vision always had a single id for every Loop category, to make dREL of the form category[keyval] resolve. We've expanded the dREL rules so that multi-key-data-name categories will still resolve economically.

I think the practical answer is that if rows in a category will be linked to from other categories, then to avoid data name proliferation a synthetic identifier is worth creating. So, for example, the topology dictionary needs to identify nodes that are joined into a net, where a node might need an atomic label, symmetry operation id, and three lattice translations in order to identify it. The loop listing the nodes in a particular net could either refer to a synthetic node_id, or use five child data names of the above items to refer to a node - so, clearly creating a node_id is worthwhile.

The hkl problem is a little different - the issue here is not data name proliferation, but that items with a physical meaning are used as identifiers, opening us up to possible duplication (ie not a key any more) when the science develops. The three lattice translations used to identify a node in the previous paragraph are also bad in this sense, as modulated structures need to specify lattice translations in a different way. Hmm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants