You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been looking at category keys for various reasons, and have happened upon some questions:
DIFFRN_REFLN: Keyed on _diffrn_refln.hkl, which is a Matrix of hkl values. Other categories (eg DIFFRN_ORIENT_REFLN) are keyed on the three indicies individually.
should this be changed?
CHEMICAL_CONN_BOND: Keyed on .atom_1 and .atom_2, but also has id as a "Unique identifier for the bond.". The .id dataname isn't referred to anywhere else in core. The same with GEOM_ANGLE, GEOM_BOND , GEOM_CONTACT, GEOM_HBOND, GEOM_TORSION, and MODEL_SITE. Some of these are understandable, as there are many key datanames (looking at you GEOM_TORSION).
Should having a single, unique, non-key identifier be a policy where there exists more than one dataname in a category key?
The text was updated successfully, but these errors were encountered:
All of the REFLN-type categories should be keyed by a separate id type, as we cannot guarantee that hkl are unique. This is a real problem for modulated structures (hklmnop...) and raw data (same peak collected more than once). I've been putting this off, but needs to be discussed and done.
chemical_conn_bond et al: the references to id are leftovers from when there were such identifiers in an earlier draft. May be deleted.
As the the more general question of when to create such "synthetic" identifiers, there is no clear-cut answer. The original DDLm vision always had a single id for every Loop category, to make dREL of the form category[keyval] resolve. We've expanded the dREL rules so that multi-key-data-name categories will still resolve economically.
I think the practical answer is that if rows in a category will be linked to from other categories, then to avoid data name proliferation a synthetic identifier is worth creating. So, for example, the topology dictionary needs to identify nodes that are joined into a net, where a node might need an atomic label, symmetry operation id, and three lattice translations in order to identify it. The loop listing the nodes in a particular net could either refer to a synthetic node_id, or use five child data names of the above items to refer to a node - so, clearly creating a node_id is worthwhile.
The hkl problem is a little different - the issue here is not data name proliferation, but that items with a physical meaning are used as identifiers, opening us up to possible duplication (ie not a key any more) when the science develops. The three lattice translations used to identify a node in the previous paragraph are also bad in this sense, as modulated structures need to specify lattice translations in a different way. Hmm.
I've been looking at category keys for various reasons, and have happened upon some questions:
DIFFRN_REFLN
: Keyed on_diffrn_refln.hkl
, which is aMatrix
of hkl values. Other categories (egDIFFRN_ORIENT_REFLN
) are keyed on the three indicies individually.CHEMICAL_CONN_BOND
: Keyed on.atom_1
and.atom_2
, but also hasid
as a "Unique identifier for the bond.". The.id
dataname isn't referred to anywhere else in core. The same withGEOM_ANGLE
,GEOM_BOND
,GEOM_CONTACT
,GEOM_HBOND
,GEOM_TORSION
, andMODEL_SITE
. Some of these are understandable, as there are many key datanames (looking at youGEOM_TORSION
).The text was updated successfully, but these errors were encountered: