Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating _diffrn.id and _structure.id with the powder dictionary #171

Open
jamesrhester opened this issue Oct 21, 2024 · 22 comments
Open

Comments

@jamesrhester
Copy link
Contributor

Core CIF has recently added a few data names for handling more complex datasets that include data collected under different conditions, potentially yielding a variety of structures. These new data names are provided in the multi-block dictionary. The powder dictionary can make use of these.

_diffrn.id

First: new data name _diffrn.id (also found in mmCIF) labels a particular set of experimental conditions (ambient environment, radiation source, crystal specimen). Previously, this information was implicitly linked to a diffractogram by DIFFRN data names appearing in the same data block as the diffractogram. We should make this link explicit by defining _pd_diffractogram.diffrn_id, whose value would refer to the set of diffraction conditions relevant to the diffractogram identified by _pd_diffractogram.id.

_structure.id

Core CIF defines a structure as a combination of the atomic sites, a unit cell, and symmetry. Clearly this is closely related to a crystallographic phase. We should determine the nature of this relationship: either it is

  1. One _pd_phase.id implies at most one particular _structure.id
  2. A particular _structure.id describes at most one specific phase
  3. both are true
  4. neither are true

I suggest (1) is not true, as for each temperature step in a multi-temperature experiment the phase would be considered the same (assuming no phase transitions) but the unit cell would be different. Therefore neither can (3) be true. I believe that (2) is thus a reasonable assertion: any structure that is reported is the structure of a particular phase under particular conditions. This means that the powder dictionary should add a data name _structure.phase_id, identifying which phase the structure relates to. Note that the link between a structure and diffraction conditions is already taken care of by the core data name _structure.diffrn_id.

The above suggestions start to address also the points raised in #164 .

Please comment, particularly regarding my understanding of the term "phase".

@rowlesmr
Copy link
Collaborator

Dammit, James. ;) I thought I had it all figured out, and then I had another think and now I don't know what I think...

I think that we've been doing with _pd_phase.id what is now supposed to be done with _structure.id.

There are a couple of ways of looking at things. One is to see how the dictionary works: can we cope with having the same _pd_phase.id in multiple blocks. The other is to think on "what is a 'phase'?".

Taking the second one first, what are some interesting edge cases to look at the limits?

I think that Ian Madsen's (I think it's his) definition is a good starting point: A phase is a crystallographically-distinct material.

  • We can get a diffraction pattern from an amorphous material; it makes no sense to have unit cell params, atomic coordinates... for such a material, so there is no structure (at least in the CIF sense). So, a phase can have no structure.
  • We can index a powder pattern; that gives us a unit cell and symmetry, but we have no atomic coordinates. With these we can do some PONKCS to do QPA-type things. So a phase can have a partial structure, and I would argue that that is enough to give a _structure.id.
  • Then there is the "normal" way of doing things, where a powder pattern represents (at least one) crystalline material which has a structure (unit cell, symmetry, atom coordinates). A phase has a structure.
  • You can have multiple phases in a specimen, consisting of any or all of amorphous, partially crystalline, or fully crystalline.

Does it make sense for a phase to have more than one structure? In the CIF sense, I think the answer has to be yes. How can you look at, say, corundum at 300 K and 1000 K, with all the concomittent changes in unit cell parameters, atomic coordinates, and displacement parameters? It's still corundum. It's just at a higher temperature.

What about the inverse questions: can a structure have more than one phase? I think the answer has to be no. If multiple phases have the same structure, then they aren't crystallographically distinct.

Now, what about multi-diffractogram experiments?

  • Pressure is applied to a (single-phase) material, causing it to gradually orient. There is no crystallographic distortion.
    • PO/Texture is an extrinsic property, and is dependent on the specimen prep and diffractogram. PO is keyed on phase id and diffractogram id, so you could probably have many patterns with a single phase. Even the same specimen diffracted from by different instruments could give different PO; cf x-ray vs neutron and the difference in irradiated volume and distribution of PO.
  • A (single-phase) sample is held at an elevated temperature. The unit cell, symmetry, and atomic coordinates of the initial phase do not change. The sample decomposes to a different phase whose unit cell, symmetry, and atomic coordinates do not change.
    • I think this is a single phase with a single structure going to another single phase with a different single structure.
  • A (single-phase) sample is heated from room temperature to some higher temperature. The unit cell expands, the symmetry remains constant, the atomic coordinates alter slightly, and the atomic displacement parameters embiggen.
    • This is still single-phase, but it has many structures.
  • Through some sort of magic process, an end-member of a solid solution series (eg fayalite, Fe2SiO4) is slowly transformed into the opposite end-member (eg forsterite, Mg2SiO4). The unit cell alters as per Vegard's Law, the symmetry is unchanged, and the atomic coords change to accomodate the change in cation size. The site occupancy changes with each diffractogram.
    • Here we come up against how to define the difference between phases. My initial thought is that we must have many phases, each with one structure. This is because the site occupancies are changing, even though the symmetry is remaining constant. Different elemental composition = different phase. Even if each consecutive pair of diffractograms contain essentially the same phase, over the entire dataset, you start/finish with entirely different phases.

From this last one, do there exist two structures with (essentially) the same unit cell parameters, the same site occupancies, and the same symmetry, but have different atomic coordinates, and are considered to be two different phases? ie what is required to be different to be different phases? Unit cell params, not necessessarily. Symmetry, yes. Site occupancies, probably yes. (again, is Mg1.95Fe0.05SiO4 a different phase to Mg1.96Fe0.06SiO4) Atomic coordinates, probably yes.

I think I'll stop there for now. Its getting late and I need sleep. I'll come back to this later.

@briantoby
Copy link
Collaborator

A couple quick comments:

Pressure is applied to a (single-phase) material, causing it to gradually orient. There is no crystallographic distortion.

Any physical change, changes the structure and possibly the microstructure, how that is modeled OTOH is discretionary. Pressure is going to change the lattice parameters for sure.

What about the inverse questions: can a structure have more than one phase? I think the answer has to be no.

There is one exception that I can think of for this -- which is more of a nomenclature issue, than a real one -- but in describing a magnetic material, one presents a structure for the atoms and one for the spins. Breaking this into two views of a single entity makes the description more compact, so there is still really only one phase, but CIF sees this as two.

@jamesrhester
Copy link
Contributor Author

I think @rowlesmr comment confirms I'm on the right track. I have proposed defining _structure.phase_id. Therefore, mathematically, _pd_phase.id is a function of _structure.id. This is equivalent to stating that, given a particular structure, a unique phase can be identified (but doesn't have to be).

Taking @rowlesmr 's cases from the top:

We can get a diffraction pattern from an amorphous material; it makes no sense to have unit cell params, atomic coordinates... for such a material, so there is no structure (at least in the CIF sense). So, a phase can have no structure.

ie there is no mapping from phase to structure. I am only asserting a mapping from structure to phase, so that's fine.

We can index a powder pattern; that gives us a unit cell and symmetry, but we have no atomic coordinates. With these we can do some PONKCS to do QPA-type things. So a phase can have a partial structure, and I would argue that that is enough to give a _structure.id.

A structure can be partially defined, just provide values for _cell.structure_id and _structure.space_group_id.

Then there is the "normal" way of doing things, where a powder pattern represents (at least one) crystalline material which has a structure (unit cell, symmetry, atom coordinates). A phase has a structure.
You can have multiple phases in a specimen, consisting of any or all of amorphous, partially crystalline, or fully crystalline.

If there is a mapping from structure to phase, multiple structures can map to a single phase, or there can be a one-to-one mapping. Both these situations are covered by the proposed definition.

Pressure is applied to a (single-phase) material, causing it to gradually orient. There is no crystallographic distortion.

In this situation both diffractogram and phase are important to describe PO. Structure is not directly involved. So structure maps to phase, and phase together with diffractogram determine a particular set of PO parameters. This shows the importance of phase as a concept separate to structure (as do many of the other examples).

A (single-phase) sample is held at an elevated temperature. The unit cell, symmetry, and atomic coordinates of the initial phase do not change. The sample decomposes to a different phase whose unit cell, symmetry, and atomic coordinates do not change.

Not sure how you can raise the temperature and have no changes to the structure? In any case, under the proposed definitions a single phase can have multiple structures, and if you want to name the phase differently at some point, that works as well.

Through some sort of magic process, an end-member of a solid solution series (eg fayalite, Fe2SiO4) is slowly transformed into the opposite end-member (eg forsterite, Mg2SiO4). The unit cell alters as per Vegard's Law, the symmetry is unchanged, and the atomic coords change to accomodate the change in cation size. The site occupancy changes with each diffractogram.

Here we come up against how to define the difference between phases. My initial thought is that we must have many phases, each with one structure. This is because the site occupancies are changing, even though the symmetry is remaining constant. Different elemental composition = different phase. Even if each consecutive pair of diffractograms contain essentially the same phase, over the entire dataset, you start/finish with entirely different phases.

At each point in the solid solution, there is a defined structure. Under the proposed definition, you have the flexibility of assigning each structure to a different phase, or to the same phase. The important thing is that the proposed definition doesn't commit you to a particular view of when a phase is no longer the same. It does commit you to only allowing a structure to be associated with one phase.

(Of course, even the latter can be worked around by creating a new _structure.id with identical cell etc.)

@jamesrhester
Copy link
Contributor Author

There is one exception that I can think of for this -- which is more of a nomenclature issue, than a real one -- but in describing a magnetic material, one presents a structure for the atoms and one for the spins. Breaking this into two views of a single entity makes the description more compact, so there is still really only one phase, but CIF sees this as two.

I'm not sure why you say that CIF sees this as two. I'm assuming that the current approach is that a separate _pd_phase.id is created and the magnetic-only structure is presented in a separate data block? I think this can be accommodated by simply assigning a different _structure.id to the magnetic structure. As noted, this is just a bit less compact. Also, part of the reason for creating _structure.id is so that the magnetic structure (and an incommensurate structure) has a way to refer to the parent structure that doesn't involve just pointing to a data block.

As an aside, we haven't exactly bedded down how we want the magnetic structure to relate to the structure as currently defined (ie the bundle of cell, space group, and atomic positions). We can either absorb magnetic structure into structure by making magnetic space group etc. belong to structure, or we can define a separate magnetic_structure identifier, or we can do both with a magnetic structure being associated with a particular _structure.id.

@rowlesmr
Copy link
Collaborator

There are a couple of ways of looking at things. One is to see how the dictionary works: can we cope with having the same _pd_phase.id in multiple blocks. The other is to think on "what is a 'phase'?".

What about the first. Can we cope with non-unique values of _pd_phase.id in the dictionary as it currently stands?

Just FYI:

save_pd_phase.id

    _definition.id                '_pd_phase.id'
    _definition.update            2022-12-03
    _description.text
;
    Arbitrary label uniquely identifying a phase.
;
    _name.category_id             pd_phase
    _name.object_id               id
    _type.purpose                 Key
    _type.source                  Assigned
    _type.container               Single
    _type.contents                Text

save_

First, which categories use _pd_phase.id as a key?

  • PD_AMORPHOUS
    • _pd_peak.id & _pd_phase.id
  • PD_CALC_COMPONENT
    • _pd_diffractogram.id, _pd_phase.id, & _pd_data.point_id
  • PD_CALIB_WAVELENGTH
    • _pd_diffractogram.id, _diffrn.id, & _pd_phase.id
  • PD_PHASE
    • _pd_phase.id
  • PD_PHASE_MASS
    • _pd_diffractogram.id & _pd_phase.id
  • PD_PREF_ORIENT
    • _pd_diffractogram.id & _pd_phase.id
  • PD_PREF_ORIENT_MARCH_DOLLASE
    • _pd_diffractogram.id, _pd_pref_orient_March_Dollase.id, & _pd_phase.id
  • PD_PREF_ORIENT_SPHERICAL_HARMONICS
    • _pd_diffractogram.id, _pd_pref_orient_spherical_harmonics.id, & _pd_phase.id
  • PD_QPA_CALIB_FACTOR
    • _pd_phase.id
  • PD_QPA_INTENSITY_FACTOR
    • _pd_diffractogram.id & _pd_phase.id
  • PD_QPA_INTERNAL_STD
    • _pd_diffractogram.id & _pd_phase.id
  • REFLN
    • _refln.index_h, _refln.index_k, _refln.index_l, & _pd_phase.id

There is a preponderance of _pd_diffractogram.id & _pd_phase.id, so as long as _pd_diffractogram.id is globally unique, and there are not multiple _pd_phase.ids in the same diffractogram, then we're golden.

PD_QPA_CALIB_FACTOR is based solely on _pd_phase.id. I will have to remind myself on how it is supposed to work and maybe add _pd_diffractogram.id to it.

PD_QPA_INTERNAL_STD uses _pd_phase.id to identify the material used as an internal standard in the given diffractogram. as long as it is able to uniquely identify the phase and structure, then it should be good

REFLN is potentially interesting, as it can also be used to list d-spacings, but you need the structure to do that, not just the phase id.

Second, which categories use _pd_phase.id, but not as a key?

  • _pd_calib_detected_intensity.phase_id
    • A code which identifies the particular phase from which this intensity was taken, if it was calibrated by a specimen.
  • _pd_calib_incident_intensity.phase_id
    • A code which identifies the particular phase from which this intensity was taken, if it was calibrated by a specimen.
  • _pd_calib_xcoord_overall.phase_id
    • A code which identifies the particular phase used in calibrating the X-coordinate, if it was calibrated by a specimen. The phase can be an internal or external standard.
  • _pd_qpa_external_std.phase_id
    • The phase (see _pd_phase.id) used as the external standard.

As long as this is enough to uniquely identify the phase and structure.

@rowlesmr
Copy link
Collaborator

rowlesmr commented Oct 26, 2024

We're going to have to beef up the definitions in PD_PHASE and give examples of how it is supposed to interact with STRUCTURE.*

I'll try and draw up an example CIF.

* Even if it is just to get it right in my head.

@jamesrhester
Copy link
Contributor Author

I'm also working on some full examples generated from GSAS-II tutorial data. If the QPA standard is given as a phase id, then you'd have to associate only a single structure with that phase. You could instead give a structure id instead of a phase ID, and that would be associated with a particular phase.

@briantoby
Copy link
Collaborator

I think there are the following types of "project CIFs" generated in GSAS-II:

  1. single-block CIFs (one phase & one histogram)
  2. multi-block combined fits: >1 phase and/or >1 histogram
  3. sequential fits w/1 block per histogram, plus overall blocks (1 phase)
  4. sequential fits w/multiple blocks per histogram, plus overall blocks (>1 phase)

Not sure we have tutorials covering all of these. Probably all but 3, but that can be generated from the sequential fit tutorial if one only includes the majority phase.

There are probably quite a few subcases for 2, if one considers one phase, multiple histograms different from, one histogram w/multiple phases, and then cases where not all phases are found in all histograms, also combined powder/single crystal.

@rowlesmr
Copy link
Collaborator

First: new data name _diffrn.id (also found in mmCIF) labels a particular set of experimental conditions (ambient environment, radiation source, crystal specimen).

AcTuAlLy, its defined as a a label for a diffraction data set collected under particular diffraction conditions (see COMCIFS/MultiBlock_Dictionary#17).

I think it should label the conditions, so that if many diffractograms are collected under the same set of conditions, then you don't need to repeat yourself.

@rowlesmr
Copy link
Collaborator

There are probably quite a few subcases for 2, if one considers one phase, multiple histograms different from, one histogram w/multiple phases, and then cases where not all phases are found in all histograms, also combined powder/single crystal.

Definitely.

I think this is where the stress test lies. Taking a temperature-dependent experiment as a baseline (could be time, pressure, magnetic field, any other combination you'd like...)

  • Multi-diffractogram data set
    • can also include neutron CW, neutron TOF, and multiple CW X-ray diffractograms at each temperature
  • Multiple phases over all diffractograms
    • The same phases may exist in many diffractograms, may not appear in some, may appear after disappearing...
  • Multiple structures per phase
    • you have structural changes within a phase as you heat it up, but it is still (for instance) corundum.

I don't think we currently have the ability to define a structure or phase that has been co-refined over multiple diffractograms. Is this a thing we want to look at? (PD_DIFFRACTOGRAM_GROUP, anyone?)

Does core CIF worry about a structure being determined from multiple data sets?

@jamesrhester
Copy link
Contributor Author

I don't think we currently have the ability to define a structure or phase that has been co-refined over multiple diffractograms. Is this a thing we want to look at? (PD_DIFFRACTOGRAM_GROUP, anyone?)

Well, using _structure.id and the new STRUCTURE category, a structure is identified using _structure.id. In the case of multiple diffractograms being used to refine a single structure, the structure is associated with a phase using the proposed _structure.phase_id. Then, for example, PD_PHASE_MASS looks like it lists the phases modelled as being present in a given diffractogram, so that's one avenue to express a single phase in multiple diffractograms, and perhaps there are others depending on how the knowledge of which phase is in which sample has come about.

The concept of a refinement has not yet been added to core CIF (which means that implicitly the results in a CIF are from a single refinement), so that's the next frontier. You could imagine a pointer in the structure category to a _refinement.id to indicate that this structure resulted from the indicated refinement.

@jamesrhester
Copy link
Contributor Author

jamesrhester commented Nov 6, 2024

Please see below draft of first example: one phase, two measurements.

Each measurement is in a separate data block, each set of diffraction conditions is also in a separate data block, all other information is in a single data block. Data blocks are linked using data names linked to _diffrn.id and _pd_phase.id. I've used PD_PHASE_MASS to link a phase to a measurement.

Key issue: there is no well-defined value for _structure.diffrn_id, as we unfortunately have historically mixed environmental conditions and probe into a single category. Not a show-stopper, as it is an optional value. The remaining data names allow deduction of the environmental conditions for the structure by going structure.id -> phase_id, and noting that both diffractograms contain that phase_id, then determining that their diffrn.id has the same conditions.

A better solution is for us to deprecate _diffrn_radiation.diffrn_id (it's already in mmCIF, unfortunately) and instead define diffrn_radiation.id so that an experiment can be cobbled together from a _diffrn.id, a _diffrn_radiation.id, and an _exptl_crystal.id

Notes:

  1. Example assumes data names at beginning of this issue have been defined
  2. Example implicitly assumes _diffrn_radiation_wavelength.diffrn_id has been defined (see here)
  3. _structure.id and _pd_phase.id are in the same data block as there is only one of each. A multi-temperature or multi-phase example would require splitting them apart.
#\#CIF2.0
#
# Example of using CIF to describe two data sets, one phase
#
# Assumes use of proposed data names.
#
# There are five data blocks:
# 2 x diffraction experimental conditions
# 2 x raw powder data using `_pd_diffractogram.diffrn_id` to
#     refer to the relevant diffraction conditions
# 1 x data block for everything else
#
data_PWDR_PBSO4.CWN_Bank_1

_pd_diffractogram.id	'PWDR PBSO4.CWN Bank 1'
_pd_diffractogram.diffrn_id   11158    # <-proposed
_pd_phase_mass.phase_id  pbso4
_pd_phase_mass.percent   100

    loop_
      _pd_meas.2theta_scan
      _pd_meas.intensity_total
      _pd_meas.intensity_total_su
         10.0                          220.0             0.004
         10.05                         214.0             0.004
         10.1                          219.0             0.004
         10.15                         224.0             0.004
         10.2                          198.0             0.005
         10.25                         229.0             0.004
         10.3                          224.0             0.004

#...

data_PWDR_PBSO4.XRA_Bank_1

_pd_diffractogram.id	'PWDR PBSO4.XRA Bank 1'
_pd_diffractogram.diffrn_id   11080    # <-proposed
_pd_phase_mass.phase_id  pbso4
_pd_phase_mass.percent   100

    loop_
      _pd_meas.2theta_scan
      _pd_meas.intensity_total
      _pd_meas.intensity_total_su
         10.0                         179.0             0.005
         10.025                       147.0             0.006
         10.05                        165.0             0.006
         10.075                       172.0             0.005
         10.1                         150.0             0.006
         10.125                       165.0             0.006
#...

data_11158

_diffrn.id	11158

_diffrn.ambient_pressure	0.1
_diffrn.ambient_temperature	300.0
_diffrn_radiation.probe	        neutron
_diffrn_radiation_wavelength.value    1.909

data_11080

_diffrn.id	11080

_diffrn.ambient_pressure	0.1
_diffrn.ambient_temperature	300.0
_diffrn_radiation.probe	x-ray

loop_
      _diffrn_radiation_wavelength.id
      _diffrn_radiation_wavelength.value
     1   1.5405
     2   1.5443

data_classic

_pd_phase.id          pbso4

# Following two could be elided as no ambiguity
_structure.id         pbso4_rt
_structure.phase_id   pbso4  # <- Proposed

_cell.angle_alpha	90.0
_cell.angle_beta	90.0
_cell.angle_gamma	90.0
_cell.length_a	         8.485
_cell.length_b	         5.402
_cell.length_c	         6.965
_cell.volume           319.305

_space_group.crystal_system	orthorhombic
_space_group.laue_class	        mmm
_space_group.name_h-m_ref	'P n m a'

loop_
      _atom_site.label
      _atom_site.fract_x
      _atom_site.fract_y
      _atom_site.fract_z
      _atom_site.type_symbol
   Pb1       0.1882            0.25             0.167       Pb
   S2        0.063             0.25             0.686       S
   O3        -0.095            0.25             0.6         O
   O4        0.181             0.25             0.543       O
   O5        0.085             0.026            0.806       O

@briantoby
Copy link
Collaborator

I dislike putting the measurement conditions in a separate block from the diffraction pattern data itself. To me they are very much linked and I see little advantage from separating them, so I would go with three blocks here rather than five. Perhaps four, since I like to have something that serves as a TOC. In this case the TOC info can be combined with the Phase block, but with multiple phases, that would need to be free-standing.

@rowlesmr
Copy link
Collaborator

rowlesmr commented Nov 7, 2024

Please see below draft of first example: one phase, two measurements.

@jamesrhester I think you have a typo, as the X-ray diffrn.id isn't referenced anywhere

@jamesrhester
Copy link
Contributor Author

Fixed

@jamesrhester
Copy link
Contributor Author

I dislike putting the measurement conditions in a separate block from the diffraction pattern data itself. To me they are very much linked and I see little advantage from separating them, so I would go with three blocks here rather than five. Perhaps four, since I like to have something that serves as a TOC. In this case the TOC info can be combined with the Phase block, but with multiple phases, that would need to be free-standing.

There is indeed no technical reason that values corresponding to the _diffrn.id under which a particular diffractogram was measured couldn't be put together into a single block with the diffractogram. This approach just becomes repetitive if many diffractograms are collected under identical conditions. I suggest that when we come to draft recommendations for presenting complex PD datasets, putting measurement conditions and diffractogram in one block can be one of them.

@jamesrhester
Copy link
Contributor Author

jamesrhester commented Nov 20, 2024

And here's an example of Brian's case number 2 with multiple phases, single histogram.

There are five blocks: one for each of the two structures*, one for each space group, and one for everything that there is only one of, which in this case is histogram and diffraction conditions. Separating space group from the structure block may seem like overkill in this case, but if you have a sequential fit (the next example I'll post) that means only stating the space group and symops once for each space group.

In theory you should also have separate data blocks for each phase, as a phase is distinct from a structure, but in this case those data blocks would contain only the phase identifiers, so I've placed _pd_phase.id into the structure blocks.

** remembering that a structure is a combination of cell, space group and atomic positions.

Edit: fixed C 2/c space group data block contents.
Edit 2: cannot loop pd_phase_mass because one of the key data names points to _pd_phase.id. Moved percentage mass into relevant phase data block.

#\#CIF_2.0
#
# Example: using pdCIF to describe multiple phases in a single histogram
#
# The sample contains CuCr2O4 and CuO impurity. Each structure is listed in
# a separate block. Data items that can be listed in a single block are
# collected together in the "classic" block.
#
data_classic

_diffrn.ambient_pressure	0.1
_diffrn.ambient_temperature	6.778
_diffrn_radiation.probe	        x-ray
_diffrn_radiation_wavelength.value     0.413263

_pd_diffractogram.id	'PWDR OH_00.fxye Bank 1'

loop_
      _pd_meas.2theta_scan
      _pd_meas.intensity_total
      _pd_meas.intensity_total_su
         0.5           65.24          0.007
         0.502         91.16          0.005
         0.504         83.89          0.0055
         0.506         73.26          0.006
         0.508         73.95          0.0065
         0.51          68.76          0.007
         0.512         55.25          0.008
# Lines omitted...

data_CuCr2O4

_structure.id	CuCr2O4
_structure.space_group_id	'F d d d'
_structure.phase_id    cucr2o4   # <- proposed
_pd_phase.id           cucr2o4
_pd_phase_mass.percent         98.7

_cell.angle_alpha	90.0
_cell.angle_beta	90.0
_cell.angle_gamma	90.0
_cell.length_a	7.712
_cell.length_b	8.543
_cell.length_c	8.536
_cell.volume	562.481

loop_

   _atom_site.label
   _atom_site.type_symbol
   _atom_site.fract_x
   _atom_site.fract_y
   _atom_site.fract_z
      Cu                  Cu   0.125                 0.125                 0.125               
      Cr                  Cr   0.5                   0.5                   0.5                 
      O                   O    0.2457                0.2682                0.2674
         
data_CuO

_structure.id	CuO
_structure.space_group_id	'C 2 C'
_structure.phase_id     cuo    # <- proposed
_pd_phase.id    cuo
_pd_phase_mass.percent       1.3

_cell.angle_alpha	90.0
_cell.angle_beta	99.81
_cell.angle_gamma	90.0
_cell.length_a	4.684
_cell.length_b	3.422
_cell.length_c	5.095
_cell.volume	80.517

loop_
  _atom_site.label
  _atom_site.type_symbol
  _atom_site.fract_x
  _atom_site.fract_y
  _atom_site.fract_z
       Cu1       Cu+2  0.25         0.25           0.0     
       O1         O-2  0.0          0.4184         0.25    

data_F_d_d_d

_space_group.crystal_system	orthorhombic
_space_group.id	'F d d d'
_space_group.laue_class	mmm
_space_group.name_h-m_alt	'F d d d'
_space_group.name_Hall           '-F 2uv 2vw'
#symops could go here

data_C_1_2/c_1

_space_group.crystal_system	monoclinic
_space_group.id	'C 2 C'
_space_group.name_h-m_alt	'C 2/c'
_space_group.name_Hall           '-C 2yc'
#symops could go here

@briantoby
Copy link
Collaborator

I have mixed feelings on one aspect of this. It is streamlined and elegant, but separating the space group from the phase means that any older code that does not know how to follow the _structure.space_group_id pointer will find see phase without a space group, which can be considered at some level to be "write-only memory". If you feel that this method of referencing information across blocks is so much part of CIF2.0 (and is widely utilized) so that you are ready to obsolete older codes, then this is a reasonable step forward. My thought is that GSAS-II would probably not use it to preserve backwards compatibility.

If small, compact, files are the goal, CIF is probably not the answer (but then XML is way worse and that has not stopped anyone from using that).

@jamesrhester
Copy link
Contributor Author

I have mixed feelings on one aspect of this. It is streamlined and elegant, but separating the space group from the phase means that any older code that does not know how to follow the _structure.space_group_id pointer will find see phase without a space group

Indeed. Fortunately, how the information is distributed over data blocks is quite flexible. The rules are:

  1. No data name can be repeated within a data block (as was ever the case)
  2. Any information repeated in different data blocks must be identical (so e.g. symops numbered the same for the same space group)
  3. Set category identifiers, and non-key pointers to those identifiers, must be included (e.g. _space_group.id and _structure.space_group_id)
  4. Data names from Set categories shouldn't be looped (so categories like DIFFRN, STRUCTURE, PD_PHASE). This rule can be dropped if _audit.schema is set to a non-default value.

I believe Rule 3 is the only practical difference to the status quo for PD data, as exemplified by GSAS-II output.

The above rules permit you to literally cut and paste the contents of the space group data blocks into the relevant structure data blocks (changing nothing) and that would still be valid.

Such flexibility is not necessarily desirable as it imposes extra burdens on software that has to read all the alternatives, not to mention the legacy issues @briantoby notes. So I think we (the PD standards community) would develop best-practice recommendations for how to distribute information over data blocks. We've already got two recommendations:

  1. Keep DIFFRN and DIFFRN_RADIATION data names in the same block as the diffractogram
  2. Keep SPACE_GROUP and SPACE_GROUP_SYMOP data names in the same block as the structure

and anticipating...

  1. Keep PD_PHASE items together with the STRUCTURE they relate to.

Meanwhile, I'm finding it quite useful to go for "maximum splittage" in these examples as that makes any missing links between categories plainer because it removes the implied link that exists when data names appear in the same data block.

If you feel that this method of referencing information across blocks is so much part of CIF2.0 (and is widely utilized) so that you are ready to obsolete older codes, then this is a reasonable step forward.

I hope my above comments demonstrate that we can keep legacy codes happy and include references across blocks. While I haven't included block id pointers in the examples, they could also be provided.

If small, compact, files are the goal, CIF is probably not the answer (but then XML is way worse and that has not stopped anyone from using that).

Sure, the most important thing is not elegance, but adoption. If none of this is used then we're wasting our time.

FWIW I think XML is on the way out, replaced by JSON.

@jamesrhester
Copy link
Contributor Author

Note I've just updated the two-phase, one diffractogram example to split the reporting of the phase mass percent to the relevant per-phase data block as dictated by the above rules.

@jamesrhester
Copy link
Contributor Author

And here is an example with multiple phases at multiple temperatures, generated from the GSAS-II sequential refinement tutorial, followed by GSAS-II CIF export and lots of editing and rearrangement. Note I've assumed that separating out DIFFRN_RADIATION from DIFFRN is acceptable, if not, then the contents of data_diffrn_radiation_setup get duplicated into all of the following data_?K data blocks.

I have not incorporated any of the newer additions to pdCIF (e.g. preferred orientation) yet.

One heuristic for generating these large files is:

  1. Create one data block for every distinct combination of Set category identifier values.
  2. Populate those blocks with all categories that have keys corresponding to those Set identifier values
  3. Anything left over goes in a separate "overall" block
  4. Duplicate and merge data blocks to taste. Data blocks with set identifiers that take a single value (and their combinations) can be put into the overall block. Data blocks with no common data names can be merged.
#\#CIF_2.0
#
# Example of a dataset containing measurements at
# multiple temperatures of a two-phase sample 
# 
# Many of these datablocks can be merged without loss of information
#
#=============================================================================

# List all distinct radiations used, one per data block

data_diffrn_radiation_setup

_diffrn_radiation.id                common
_diffrn_radiation_wavelength.value  0.41326
_diffrn_radiation.probe             x-ray
_diffrn_radiation.polarisn_ratio    0.9900

#=============================================================================

# List all distinct environmental conditions, one per data block

data_7K

_diffrn.id                   7K
_diffrn.ambient_temperature  6.778
_diffrn.ambient_pressure     100
_diffrn.diffrn_radiation_id  common

data_17K

_diffrn.id                   17K
_diffrn.ambient_temperature  16.702
_diffrn,ambient_pressure     100
_diffrn.diffrn_radiation_id  common

data_47K

_diffrn.id                   47K
_diffrn.ambient_temperature  46.97
_diffrn.ambient_pressure     100
_diffrn.diffrn_radiation_id  common

#=============================================================================

# List all distinct structures, one per data block

data_cr2cuo4_7k

_structure.id                cr2cuo4_7K
_structure.diffrn_id         7K
_structure.space_group_id    fddd
_structure.phase_id          cr2cuo4

_cell.length_a  7.71270(3)
_cell.length_b  8.54329(4)
_cell.length_c  8.53643(4)
_cell.angle_alpha  90
_cell.angle_beta   90
_cell.angle_gamma  90
_cell.volume  562.481(6)

_chemical_formula.sum  "Cr2 Cu O4"
_chemical_formula.weight  231.53

# ATOMIC COORDINATES AND DISPLACEMENT PARAMETERS
loop_ 
   _atom_site.label
   _atom_site.type_symbol
   _atom_site.fract_x
   _atom_site.fract_y
   _atom_site.fract_z
   _atom_site.occupancy
   _atom_site.adp_type
   _atom_site.U_iso_or_equiv
   _atom_site.site_symmetry_multiplicity
Cu     Cu   0.12500     0.12500     0.12500     1.0000     Uiso 0.00003(22) 8   
Cr     Cr   0.50000     0.50000     0.50000     1.0000     Uiso 0.00011(22) 16  
O      O    0.24582(21) 0.2682(4)   0.2674(4)   1.0000     Uiso -0.0042(5) 32  

data_cr2cuo4_17k

_structure.id                cr2cuo4_17K
_structure.diffrn_id         17K
_structure.space_group_id    fddd
_structure.phase_id          cr2cuo4

_cell.length_a  7.71286(3)
_cell.length_b  8.54321(4)
_cell.length_c  8.53651(4)
_cell.angle_alpha  90
_cell.angle_beta   90
_cell.angle_gamma  90
_cell.volume  562.493(6)

_chemical_formula_sum  "Cr2 Cu O4"
_chemical_formula_weight  231.53

# ATOMIC COORDINATES AND DISPLACEMENT PARAMETERS
loop_ 
   _atom_site.label
   _atom_site.type_symbol
   _atom_site.fract_x
   _atom_site.fract_y
   _atom_site.fract_z
   _atom_site.occupancy
   _atom_site.adp_type
   _atom_site.U_iso_or_equiv
   _atom_site.site_symmetry_multiplicity
Cu     Cu   0.12500     0.12500     0.12500     1.0000     Uiso 0.00062(21) 8   
Cr     Cr   0.50000     0.50000     0.50000     1.0000     Uiso 0.00036(21) 16  
O      O    0.24520(20) 0.2681(4)   0.2676(4)   1.0000     Uiso -0.0042(4) 32  

data_cr2cuo4_47k

_structure.id                cr2cuo4_7K
_structure.diffrn_id         7K
_structure.space_group_id    fddd
_structure.phase_id          cr2cuo4

_cell.length_a  7.713768(29)
_cell.length_b  8.54289(3)
_cell.length_c  8.53669(4)
_cell.angle_alpha  90
_cell.angle_beta   90
_cell.angle_gamma  90
_cell.volume  562.550(5)

_chemical_formula.sum  "Cr2 Cu O4" # <- need to add ptr to _structure.id
_chemical_formula.weight  231.53   # <- need to add ptr to _structure.id

# ATOMIC COORDINATES AND DISPLACEMENT PARAMETERS
loop_ 
   _atom_site.label
   _atom_site.type_symbol
   _atom_site.fract_x
   _atom_site.fract_y
   _atom_site.fract_z
   _atom_site.occupancy
   _atom_site.adp_type
   _atom_site.U_iso_or_equiv
   _atom_site.site_symmetry_multiplicity
Cu     Cu   0.12500     0.12500     0.12500     1.0000     Uiso 0.00086(21) 8   
Cr     Cr   0.50000     0.50000     0.50000     1.0000     Uiso 0.00020(20) 16  
O      O    0.24566(20) 0.2674(4)   0.2676(4)   1.0000     Uiso -0.0032(4) 32  

data_cuo_7K

_structure.id                cuo_7K
_structure.diffrn_id         7K
_structure.space_group_id    c2c
_structure.phase_id          cuo

_cell.length_a  4.677(4)
_cell.length_b  3.4188(11)
_cell.length_c  5.131(6)
_cell.angle_alpha  90
_cell.angle_beta   99.751(21)
_cell.angle_gamma  90
_cell.volume  80.860(18)

# ATOMIC COORDINATES AND DISPLACEMENT PARAMETERS
loop_ 
   _atom_site.label
   _atom_site.type_symbol
   _atom_site.fract_x
   _atom_site.fract_y
   _atom_site.fract_z
   _atom_site.occupancy
   _atom_site.adp_type
   _atom_site.U_iso_or_equiv
   _atom_site.site_symmetry_multiplicity
Cu1    Cu2+ 0.25000     0.25000     0.00000     1.0000     Uiso 0.0010     4   
O1     O2-  0.00000     0.41840     0.25000     1.0000     Uiso 0.0010     4

data_cuo_17K

_structure.id                cuo_17K
_structure.diffrn_id         17K
_structure.space_group_id    c2c
_structure.phase_id          cuo

_cell.length_a  4.6779(31)
_cell.length_b  3.4196(10)
_cell.length_c  5.130(5)
_cell.angle_alpha  90
_cell.angle_beta   99.754(18)
_cell.angle_gamma  90
_cell.volume  80.871(16)

# ATOMIC COORDINATES AND DISPLACEMENT PARAMETERS
loop_ 
   _atom_site.label
   _atom_site.type_symbol
   _atom_site.fract_x
   _atom_site.fract_y
   _atom_site.fract_z
   _atom_site.occupancy
   _atom_site.adp_type
   _atom_site.U_iso_or_equiv
   _atom_site.site_symmetry_multiplicity
Cu1    Cu2+ 0.25000     0.25000     0.00000     1.0000     Uiso 0.0010     4   
O1     O2-  0.00000     0.41840     0.25000     1.0000     Uiso 0.0010     4

data_cuo_47K

_structure.id                cr2cuo4_7K
_structure.diffrn_id         47K
_structure.space_group_id    c2c
_structure.phase_id          cuo

_cell.length_a  4.677(3)
_cell.length_b  3.4199(10)
_cell.length_c  5.131(5)
_cell.angle_alpha  90
_cell.angle_beta   99.771(20)
_cell.angle_gamma  90
_cell.volume  80.886(17)

# ATOMIC COORDINATES AND DISPLACEMENT PARAMETERS
loop_ 
   _atom_site.label
   _atom_site.type_symbol
   _atom_site.fract_x
   _atom_site.fract_y
   _atom_site.fract_z
   _atom_site.occupancy
   _atom_site.adp_type
   _atom_site.U_iso_or_equiv
   _atom_site.site_symmetry_multiplicity
Cu1    Cu2+ 0.25000     0.25000     0.00000     1.0000     Uiso 0.0010     4   
O1     O2-  0.00000     0.41840     0.25000     1.0000     Uiso 0.0010     4   

#=============================================================================

# List spacegroups appearing in structures, one per data block

data_fddd
_space_group.id            fddd
_space_group.name_H-M_alt  "F d d d"
_space_group.name_Hall  "-F 2uv 2vw"

loop_
    _space_group_symop.id
    _space_group_symop.operation_xyz
     1  x,y,z
     2  -x,1/4+y,1/4+z
     3  1/4+x,-y,1/4+z
     4  3/4-x,1/4-y,1/2+z
     5  -x,-y,-z
# ...

data_c2c

_space_group.id            c2c
_space_group.name_H-M_alt  "C 2/c"
_space_group.name_Hall     "-C 2yc"

loop_
    _space_group_symop.id
    _space_group_symop.operation_xyz
     1  x,y,z
     2  -x,y,1/2-z
     3  -x,-y,-z
     4  x,-y,1/2+z
     5  1/2+x,1/2+y,z
     6  1/2-x,1/2+y,1/2-z
     7  1/2-x,1/2-y,-z
     8  1/2+x,1/2-y,1/2+z

#============================================================================

# List per-phase information, one phase per block

data_cr2cuo4

_pd_phase.id       cr2cuo4
_pd_phase.name     Cr2CuO4

data_cuo

_pd_phase.id       cuo
_pd_phase.name     CuO

#============================================================================

# List per-diffractogram information, one per block

data_0H_00

_pd_diffractogram.id       0H_00
_pd_diffractogram.diffrn_id  7K
_pd_meas.2theta_range_min  0.50000
_pd_meas.2theta_range_max  26.09600
_pd_meas.2theta_range_inc  0.00200
_pd_meas.number_of_points  12799

loop_
   _pd_meas.intensity_total
   _pd_calc.intensity_total
   _pd_proc.intensity_bkg_calc
   _pd_proc.ls_weight

  43.783814    41.795171    41.771237   0.0229313 
  45.626478    41.851699    41.827565   0.0219996 
  47.171463    41.908055    41.883717   0.021299  
  36.951371    41.964215    41.939672   0.0272123 
  33.266743    42.020211    41.99546    0.0301765 
  40.981582    42.075989    42.051027   0.0246658 
  41.683548    42.131611    42.106435   0.0245573 
# ... measurements omitted

data_0H_04

_pd_diffractogram.id       0H_04
_pd_diffractogram.diffrn_id   17K
_pd_meas.2theta_range_min  0.50000
_pd_meas.2theta_range_max  26.09600
_pd_meas.2theta_range_inc  0.00200
_pd_meas.number_of_points  12799

loop_
   _pd_meas.intensity_total
   _pd_calc.intensity_total
   _pd_proc.intensity_bkg_calc
   _pd_proc.ls_weight
  29.898017    41.711299    41.687919   0.0335978 
  39.768154    41.769158    41.745582   0.0253885 
  39.527914    41.826837    41.803062   0.0253699 
  46.349986    41.884328    41.860353   0.0218265 
  42.403998    41.941639    41.91746    0.0240971 
  47.876864    41.99877     41.974385   0.0210206 
  40.335609    42.055705    42.031112   0.0249116 
# ...omitted measurements

data_OH_09

_pd_diffractogram.id          0H_09
_pd_diffractogram.diffrn_id   47K
_pd_meas.2theta_range_min  0.50000
_pd_meas.2theta_range_max  26.09600
_pd_meas.2theta_range_inc  0.00200
_pd_meas.number_of_points  12799

loop_
   _pd_meas.intensity_total
   _pd_calc.intensity_total
   _pd_proc.intensity_bkg_calc
   _pd_proc.ls_weight
  42.173306    41.069518    41.047364   0.0238043 
  48.964589    41.127992    41.105653   0.0205706 
  45.8184      41.186308    41.163781   0.021942  
  43.853758    41.244428    41.221709   0.0231165 
  61.582546    41.302373    41.279462   0.0163382 
  35.581044    41.360115    41.337009   0.0282443 
  48.461362    41.417706    41.394402   0.0207815 
# ...measurements omitted

#===============================================================

# Information that is per-phase, per histogram

data_0H_cr2cuo4

_pd_diffractogram.id       0H_00
_pd_phase.id               cr2cuo4
_pd_phase.mass_percent     0.9888(4)

# The following assumes that _refln.diffractogram_id
# has been defined so that per-diffractogram results
# can be provided.

loop_
   _refln.index_h
   _refln.index_k
   _refln.index_l
   _refln.F_squared_meas
   _refln.F_squared_calc
   _refln.phase_calc
   _refln.d_spacing
  1    1    1    2923.7101  1848.8118  -0.0    4.75465    
  0    2    2    50887.9824 44176.2312 180.0   3.01930    
  2    2    0    41129.5142 38146.8191 180.0   2.86244    
  2    0    2    40719.7976 38182.2594 -180.0  2.86141   
# ...

# preferred orientation information also goes here

data_0H_cuo

_pd_diffractogram.id       0H_00
_pd_phase.id               cuo
_pd_phase.mass_percent     0.0112(4)

loop_
   _refln.index_h
   _refln.index_k
   _refln.index_l
   _refln.F_squared_meas
   _refln.F_squared_calc
   _refln.phase_calc
   _refln.d_spacing

  1    1    0    4339.5540  527.9252   180.0   2.74595   
  0    0    2    5381.1374  4772.3738  -0.0    2.52848   
  1    1    -1   7778.9105  6672.9552  180.0   2.52221   
  1    1    1    10519.8297 10580.5215 -180.0  2.31709   
  2    0    0    6179.8194  4667.8699  180.0   2.30477   
  1    1    -2   1570.8862  298.7972   -0.0    1.96126   
# ...

data_04_cr2cuo4

_pd_diffractogram.id       0H_04
_pd_phase.id               cr2cuo4
_pd_phase.mass_percent     0.9885(4)

loop_
   _refln.index_h
   _refln.index_k
   _refln.index_l
   _refln.F_squared_meas
   _refln.F_squared_calc
   _refln.phase_calc
   _refln.d_spacing

  1    1    1    2857.5850  1815.5648  -0.0    4.75469   
  0    2    2    50066.2033 44075.8291 180.0   3.01930   
  2    2    0    40970.9520 37913.9619 180.0   2.86246   
  2    0    2    40623.1705 37934.3275 -180.0  2.86146   
  1    3    1    65233.6297 67538.1258 180.0   2.54953   
# ...

data_04_cuo

_pd_diffractogram.id       0H_04
_pd_phase.id               cuo
_pd_phase.mass_percent     0.0115(4)

loop_
   _refln.index_h
   _refln.index_k
   _refln.index_l
   _refln.F_squared_meas
   _refln.F_squared_calc
   _refln.phase_calc
   _refln.d_spacing

  1    1    0    3027.3298  528.0777   180.0   2.74652    
  0    0    2    5645.9302  4772.1111  -0.0    2.52777    
  1    1    -1   7984.2080  6673.2990  180.0   2.52253    
  1    1    1    10367.4860 10581.1099 -180.0  2.31726    
  2    0    0    6208.4277  4668.0844  180.0   2.30514  
# ...

data_09_cr2cuo4

_pd_diffractogram.id       0H_09
_pd_phase.id               cr2cuo4
_pd_phase.mass_percent     0.9865(4)

loop_
   _refln.index_h
   _refln.index_k
   _refln.index_l
   _refln.F_squared_meas
   _refln.F_squared_calc
   _refln.phase_calc
   _refln.d_spacing

  1    1    1    2866.3493  1806.1692  -0.0    4.75488   
  0    2    2    48243.5918 43869.8403 180.0   3.01927   
  2    2    0    40547.2038 38008.3146 180.0   2.86260   
  2    0    2    40068.9116 37989.4186 -180.0  2.86167   
  1    3    1    66629.5169 68307.3304 180.0   2.54949   
# ...

data_09_cuo

_pd_diffractogram.id       0H_09
_pd_phase.id               cuo
_pd_phase.mass_percent     0.0135(4)

loop_
   _refln.index_h
   _refln.index_k
   _refln.index_l
   _refln.F_squared_meas
   _refln.F_squared_calc
   _refln.phase_calc
   _refln.d_spacing

  1    1    0    4614.5984  528.0627   180.0   2.74646    
  0    0    2    5629.4996  4772.3673  -0.0    2.52846    
  1    1    -1   7957.5050  6673.6405  180.0   2.52285    
  1    1    1    10650.5252 10580.9378 -180.0  2.31721    
  2    0    0    5809.4324  4667.7910  180.0   2.30463    
# ...
#--eof--eof--eof--eof--eof--eof--eof--eof--eof--eof--eof--eof--eof--eof--eof--#

@jamesrhester
Copy link
Contributor Author

As there haven't been any objections so far, I'm going to go ahead and submit a PR for _structure.phase_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants