Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The best way of recording st/l table information #12

Open
vaitkus opened this issue Apr 25, 2024 · 2 comments
Open

The best way of recording st/l table information #12

vaitkus opened this issue Apr 25, 2024 · 2 comments

Comments

@vaitkus
Copy link
Collaborator

vaitkus commented Apr 25, 2024

The problem

The _atom_rho_multipole.scat_valence_table and _atom_rho_multipole.scat_core_table data items are assigned the Array container type, but are described as tables in their human-readable descriptions, i.e.:

The table contains the st/l value as the key and the scattering factor
as the value. E.g. {"0.00":"15.65","0.05":"15.32",.....etc }

Furthermore, there is also the ATOM_SCAT_VERSUS_STOL category from the CIF_CORE dictionary which records similar information using separate data items.

Currently, I can think of three different ways this data can be represented and each comes with its pros and cons.

Array

With the "Array" container type the example value would be recorded as:

[ 0.00 15.65 0.05 15.32, ... ]

Pros:

  • Both the st/l and the scattering values can be explicitly defined as numeric.

Cons:

  • The key-value pairing is be implied by the order of the elements.
  • No DDLm way of specifying that the key values must be unique.
  • No DDLm way of specifying the array must be even-sized.
  • Numeric values that describe semantically different values are stored in the same data structure.
  • Fragile -- one missing or misplaced value may break the pairing of subsequent values.

Table

With the "Table" container type the example value would be recorded as:

{ "0.00":15.65 "0.05":15.32 ...}

Pros:

  • Key-value pairs are explicit.
  • Missing keys or values would cause a syntactic error.
  • No duplicate key values.

Cons:

  • Key values are no longer explicitly declared as numeric (all CIF_2.0 keys are strings), that is, a CIF validator would not automatically detect incorrect data like {"xxx":15:65}.

Separate category

When defined as a separate category (i.e. ATOM_RHO_MULTIPOLE_VALENCE_SCAT_VERSUS_STOL) similarly to the ATOM_SCAT_VERSUS_STOL category, the example value would look something like:

loop_
_atom_rho_multipole_valence_scat_versus_stol.atom_label
_atom_rho_multipole_valence_scat_versus_stol.stol_value
_atom_rho_multipole_valence_scat_versus_stol.scat_value
C_5 0.00 15.65
C_5 0.05 15.32
# ....
C_6 0.00 15.66
C_6 0.00 15.43

This category would have a composite key that consists of cat.atom_label and cat.stol_value where the cat.atom_label is also linked to the _atom_rho_multipole.atom_label data item.

Pros:

  • Key-value pairs are explicit.
  • Uneven number of keys and values would cause a syntactic error.
  • All values can be explicitly declared as numeric (and even with different enumeration ranges, e.g. st/l could be restricted to positive values).
  • No duplicate key values.

Cons:

  • An additional level of indirection. Since the values are presented in a separate loop, they might be slightly harder to read or associate with a specific atom site.

Additional questions

  • Should the st/l values be associated with a specific atom site identified by the unique atom label or would it suffice to link the to atom types (as is currently done in the ATOM_SCAT_VERSUS_STOL category from the CIF core dictionary)?
  • Should the values of st/l or scat be declared as non-negative [0, inf]? I guess it's a "no" for the scattering, but what about the st/l. Note, that this would also apply to the ATOM_SCAT_VERSUS_STOL category (currently it does not define any limits).

Final remarks

It seems, that the two items in question were only introduced after the migration from DDL1 to DDLm and are thus very unlikely to be currently used by any piece of software. That does provide some freedom for refactoring.

Personally, I would probably go with the separate category approach. @jamesrhester, @nautolycus do you have any preference on this?

@jamesrhester
Copy link
Contributor

Thanks for this detailed analysis. I strongly prefer the separate category approach. I'm not convinced that the "Table" data type is very useful in general, as it is exactly equivalent to a category expressed a bit more concisely, but losing all of the benefits of exposing the category to the DDLm machinery (as you point out). I can imagine that it was introduced to avoid repeating the atom type for every line, but if we go down that path any loop with more than one key data name can be turned into a loop with one key data name and a table, and we are recreating a hierarchical model, but poorly.

We should associate with atom type, not site. If atoms have the same nominal type, but different form factors, then a user can create a separate atom type for each one.

st/l should always be positive.

@nautolycus
Copy link
Contributor

Yes, I agree with the overall analysis and with James's preferred resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants