Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonisation of dictionary naming convention #488

Open
nautolycus opened this issue Apr 8, 2024 · 11 comments
Open

Harmonisation of dictionary naming convention #488

nautolycus opened this issue Apr 8, 2024 · 11 comments
Labels

Comments

@nautolycus
Copy link
Collaborator

@vaitkus and I were musing over the wide variation in names and references to the various dictionaries, and note the following scatter of designators (G means version on GitHub (sometimes corresponds to DDL1/DDLm versions), P refers to the current latest versions on the wwPDB site (over which COMCIFS has no say). We feel it would be beneficial to harmonise these, in ways suggested below the table. I raise this as an issue in the core area, but it relates to all active dictionary projects. If we approve a harmonisation project, I suggest making changes initially to the electron density dictionary, which I'm currently revising and which has no active input from a wider community.

dictionary ITG1e ITG2e Website Logo File DDLm _dictionary.title
Core coreCIF coreCIF coreCIF CIF_core cif_core.dic CORE_DIC
Powder pdCIF pdCIF pdCIF CIF_pd cif_pd.dic
cif_pow.dic (G) CIF_POW
Modulated struct msCIF msCIF msCIF CIF_ms cif_ms.dic CIF_MS
Electron density rhoCIF rhoCIF rhoCIF CIF_rho cif_rho.dic Cif_rho
Macromolecular mmCIF mmCIF mmCIF CIF_mm cif_mm.dic N/A
mmcif_std.dic (P)
mmcif_pdbx.dic (P)
Image imgCIF imgCIF imgCIF CIF_img cif_img.dic Cif_img.dic
mmcif_img.dic (P)
Symmetry symCIF symCIF symCIF CIF_sym cif_sym.dic N/A
Restraints N/A . . CIF_restraints cif_core_restraints.dic
cif_restr.dic (G) CIF_RSTR
Twinning N/A twinCIF . CIF_twinning cif_twinning.dic
cif_twin.dic (G) CIF_TWIN
Magnetic N/A magCIF magCIF CIF_mag cif_mag.dic MAGNETIC_CIF
Topology N/A . topoCIF CIF_topo cif_topology.dic TOPOLOGY_CIF
Topology.dic (G)
Multiblock core N/A . "multiblock coreCIF" . cif_multiblock.dic MULTIBLOCK_DIC
multi_block_core.dic (G)
DDL1 . . . CIF_DDL ddl_core.dic N/A
DDL2 . . . CIF_DDL mmcif_ddl.dic N/A
mmcif_ddl.dic (P)
DDLm N/A . . CIF_DDL ddl.dic DDL_DIC
DDLm.dic (G)

Specific suggestions:

  • cif_pow.dic -> cif_pd
  • cif-core_restraints.dic -> cif_restr.dic and logo shortened to CIF_restr
  • cif_twinning.dic -> cif_twin.dic and logo shortened to CIF_twin
  • Topology.dic and cif_topology.dic -> cif_topo.cif and logo shortened to CIF_topo
  • multi_block_core.dic and cif_multiblock.dic -> cif_multi.dic and logo commissioned in form CIF_multi
  • Subject to above, short forms restrCIF, twinCIF, topoCIF and multiCIF be adopted where relevant in itg2e and IUCr website.
  • Retain mmcif_ddl.dic for the DDL2 dictionary, as no DDL2 development outside of the mmCIF area is envisaged.
  • _dictionary.title same as the filename without .dic suffix

Other comments

  • Do we need distinct logos for DDL1, DDL2 and DDLm?
  • Not sure what is the best approach for DDL1 and DDL2 file names. Thoughts welcome.
  • It might also be nice to homogenise the naming conventions of dictionary repositories on GitHub since now the names are a bit chaotic (e.g. cif_core, TopoCIF, Powder_Dictionary). We think the repository renaming on GitHub is not very painful, since the URL of the old repository should automatically redirect the user to the new URL
@jamesrhester
Copy link
Contributor

Great work in compiling this table. I agree with all the suggestions.

I do not think DDLs need their own logos. That just draws attention to differences which are for most purposes irrelevant.

I think ddl filenames of the form ddl1.dic and ddlm.dic would work. We can't rename something the wwPDB have custody of - but we could make ddl2.dic resolve to mmcif_ddl.dic in those situations we have control over.

Yes, I agree that we should unify the repository names.

@vaitkus
Copy link
Collaborator

vaitkus commented Apr 11, 2024

One more question -- should _dictionary.title attribute values be written in all uppercase letters as is being done now? This is mostly a stylistic consideration since _dictionary.title values are case-insensitive.

For example, the current name of the cif_core.dic dictionary is CORE_CIF, so should the name value be changed to cif_core or to CIF_CORE? I think that having it in uppercase makes sense for two reasons:

  1. The _dictionary.title attribute value is also used as the category name of the HEAD category (see description of _name.category_id). Having it in uppercase would be consistent with all the other category names.
  2. The _dictionary.title attribute value is also used as the name of the main dictionary data block (i.e. data_CIF_CORE). Having it in uppercase would be more consistent with the rest of the dictionary (category save frame names are written in all uppercase, data name save frames are written in all-lowercase).

@jamesrhester
Copy link
Contributor

All uppercase works for me.

@nautolycus
Copy link
Collaborator Author

No objection.

@vaitkus
Copy link
Collaborator

vaitkus commented Apr 11, 2024

@jamesrhester, I guess the "Change approved" label is appropriate or should this discussion be circulated elsewhere as well?

@jamesrhester
Copy link
Contributor

Raising an issue/pull request in the respective repositories would be sufficient, referencing this issue.

@vaitkus
Copy link
Collaborator

vaitkus commented May 1, 2024

It was agreed in an email exchange between @jamesrhester, @nautolycus and @vaitkus that repositories should be named same as the dictionary title except in all lower case, e.g. the CIF_CORE dictionary should reside in a repository called cif_core.

There is actually one more name that would be nice to standardise -- the name of the Head save frame. This save frame is referenced by other dictionaries when importing the entire dictionary, therefore I propose to keep the name of this save frame same as the data name. That is, the cif_core.dic dictionary file would have a single CIF_CORE data block which in turn would contain the CIF_CORE save frame (see PR #490 for an example of this layout). The import statement would then look something like:

[{'dupl':Ignore  'file':cif_core.dic  'mode':Full  'save':CIF_CORE}]

Alternatively, we could call the HEAD save frame something neutral like "head", since the dictionary name would already be reflected in import statements by the dictionary filename. The import statement would then look something like:

[{'dupl':Ignore  'file':cif_core.dic  'mode':Full  'save':HEAD}]

Which one looks better?

@jamesrhester
Copy link
Contributor

I think the second one ('HEAD') provides more information to the casual reader, immediately showing that the whole of the imported dictionary is being enhanced.

@vaitkus
Copy link
Collaborator

vaitkus commented May 14, 2024

Having the same name for the data block (dictionary) and the head save frame, while syntactically valid, seems to have already caused some issue with the external software (see emmo-repo/CIF-ontology#194), so it would be best to avoid it.

I think the second one ('HEAD') provides more information to the casual reader, immediately showing that the whole of the imported dictionary is being enhanced.

I fully agree with these points, but after thinking about it some more, I think that we should also avoid having the same HEAD category name in all dictionaries since it might obfuscate certain errors (e.g. those dealing with import statements). Instead, I suggest that the HEAD category names be constructed by appending the _HEAD to the dictionary name (e.g. CIF_CORE_HEAD, CIF_PD_HEAD). What do you think?

@jamesrhester
Copy link
Contributor

I think <dictionary_name>_HEAD is an excellent idea. I had also started wondering if it was really a good idea to have the same category name (HEAD) in many dictionaries.

@nautolycus
Copy link
Collaborator Author

I'm also happy to go with <dictionary_name>_HEAD. In favour of the simple HEAD proposal, that has, to my mind, the effect of establishing a formal starting point, just as '/' indicates the root of a filesystem. Importing to a new HEAD has, in this metaphor, a similar effect as a chroot call. On the other hand, I have in the past tripped up in a chroot'd environment precisely because inspection of the root designator doesn't give you a clue that you might be in an unexpected place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants