Skip to content

Commit

Permalink
ENH: Add framework for property model parameter generation (#251)
Browse files Browse the repository at this point in the history
This is a significant feature improvement that introduces the concept of a `ModelFittingDescription` made up of `FittingStep` objects to tell ESPEI how to generate model parameters (with a capability to support generating parameters for custom models). In addition, these changes include:

- Significant refactoring and simplification of `espei.paramselect`, the main module for parameter selection:
    - Most of the code that was duplicated for fitting endmembers, binary interactions, and ternary interactions was unified. 
    - The `fit_ternary_interactions` function was completely removed and binary and ternary interactions now run the exact same code path.
    - The most significant changes are in `fit_parameters`, which now relies on dependency injection of a fitting description to know what fitting steps to take (we try to not make any assumptions about what parameters or data types are being fit) and this function is more streamlined to 1) select relevant data from datasets, 2) get the RHS (`b` in `Ax=b`) of the linear problem from the fitting step, 3) build candidate models (build a collection feature matrices, `A`), 4) select the best model from the candidates, and 5) insert the parameter and coefficients for the best model into the database
- Refactoring to remove `espei.parameter_selection.utils`. Some of the functionality got moved to be in fitting steps, and `get_sample_condition_dicts` was moved to `espei.error_functions.non_equilibrium_thermochemical_error`, with `get_prop_samples` that it is used in conjunction with. Trying to have fewer "utils" modules. 
- Tests to verify that molar volume and custom property models (elastic) work against various edge cases
- New docs page with a tutorial for using a custom model and fitting description for fitting elastic constants with a [repository](https://github.com/bocklund/espei-elastic-parameter-generation) containing the code
- Small tweak to follow links in dataset recursive glob search. This is useful for organizing datasets for different runs while having one single source of truth for the data
  • Loading branch information
bocklund authored Jan 17, 2024
1 parent 85a0079 commit c3457ff
Show file tree
Hide file tree
Showing 27 changed files with 1,379 additions and 567 deletions.
24 changes: 16 additions & 8 deletions docs/api/espei.parameter_selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ espei.parameter\_selection package
Submodules
----------

espei.parameter\_selection.fitting\_descriptions module
-------------------------------------------------------

.. automodule:: espei.parameter_selection.fitting_descriptions
:members:
:undoc-members:
:show-inheritance:

espei.parameter\_selection.fitting\_steps module
------------------------------------------------

.. automodule:: espei.parameter_selection.fitting_steps
:members:
:undoc-members:
:show-inheritance:

espei.parameter\_selection.model\_building module
-------------------------------------------------

Expand All @@ -28,14 +44,6 @@ espei.parameter\_selection.selection module
:undoc-members:
:show-inheritance:

espei.parameter\_selection.utils module
---------------------------------------

.. automodule:: espei.parameter_selection.utils
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

Expand Down
6 changes: 3 additions & 3 deletions docs/cu-mg-example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ ESPEI-datasets repository so that others may benefit from this data as you have.
You may then add your name to the CONTRIBUTORS file as described in the README.


Phases and CALPHAD models
Phases and Calphad models
=========================

The Cu-Mg system contains five stable phases: Liquid, disordered fcc and hcp,
Expand Down Expand Up @@ -200,7 +200,7 @@ MCMC optimization
With the data in the CU-MG input data, ESPEI generated 18 parameters to fit. For
systems with more components, solution phases, and input data, may more
parameters could be required to describe the thermodynamics of the specific
system well. Because they describe Gibbs free energies, parameters in CALPHAD
system well. Because they describe Gibbs free energies, parameters in Calphad
models are highly correlated in both single-phase descriptions and for
describing equilibria between phases. For large systems, global numerical
optimization of many parameters simultaneously is computationally intractable.
Expand Down Expand Up @@ -391,7 +391,7 @@ the diagonal and covariances between them under the diagonal. A more
circular covariance means that parameters are not correlated to each
other, while elongated shapes indicate that the two parameters are
correlated. Strongly correlated parameters are expected for some
parameters in CALPHAD models within phases or for phases in equilibrium,
parameters in Calphad models within phases or for phases in equilibrium,
because increasing one parameter while decreasing another would give a
similar error.

Expand Down
6 changes: 3 additions & 3 deletions docs/design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ The goal is to make it clear how different modules in ESPEI fit together and whe

ESPEI provides tools to

1. Parameterize CALPHAD models by optimizing the compromise between model accuracy and complexity. We typically call this parameter generation or model selection.
2. Fit parameterized CALPHAD models to thermochemical and phase boundary data or other custom data with uncertainty quantification via Markov chain Monte Carlo
1. Parameterize Calphad models by optimizing the compromise between model accuracy and complexity. We typically call this parameter generation or model selection.
2. Fit parameterized Calphad models to thermochemical and phase boundary data or other custom data with uncertainty quantification via Markov chain Monte Carlo

API
---
Expand Down Expand Up @@ -47,7 +47,7 @@ Parameter selection
-------------------

Parameter selection goes through the ``generate_parameters`` function in the ``espei.paramselect`` module.
The goal of parameter selection is go through each phase (one at a time) and fit a CALPHAD model to the data.
The goal of parameter selection is go through each phase (one at a time) and fit a Calphad model to the data.

For each phase, the endmembers are fit first, followed by binary and ternary interactions.
For each individual endmember or interaction to fit, a series of candidate models are generated that have increasing
Expand Down
16 changes: 9 additions & 7 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@

\part{Introduction}

ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for creating CALPHAD databases and evaluating the uncertainty of CALPHAD models.
The purpose of ESPEI is to be both a user tool for fitting state-of-the-art CALPHAD-type models and to be a research platform for developing methods for fitting and uncertainty quantification.
ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for creating Calphad databases and evaluating the uncertainty of Calphad models.
The purpose of ESPEI is to be both a user tool for fitting state-of-the-art Calphad-type models and to be a research platform for developing methods for fitting and uncertainty quantification.
ESPEI uses `pycalphad`_ for the thermodynamic backend and supports fitting adjustable parameters for any pycalphad model.

ESPEI is developed in the open on `GitHub <https://github.com/PhasesResearchLab/ESPEI>`_.
Expand All @@ -26,16 +26,16 @@ What does ESPEI do?
Parameter generation
~~~~~~~~~~~~~~~~~~~~

ESPEI can be used to generate model parameters for CALPHAD models of the Gibbs energy that follow the temperature-dependent power series expansion of the Gibbs energy within the compound energy formalism (CEF) for endmembers and for binary and ternary Redlich-Kister interaction parameters with Muggianu extrapolation.
This parameter generation step augments the CALPHAD modeler by providing tools for data-driven model selection, rather than relying on a modeler's intuition alone.
ESPEI can be used to generate model parameters for Calphad models of the Gibbs energy that follow the temperature-dependent power series expansion of the Gibbs energy within the compound energy formalism (CEF) for endmembers and for binary and ternary Redlich-Kister interaction parameters with Muggianu extrapolation.
This parameter generation step augments the Calphad modeler by providing tools for data-driven model selection, rather than relying on a modeler's intuition alone.
Model generation is based on a linear regression of enthalpy, entropy, and heat capacity data (see :ref:`non-equilibrium thermochemical data <non_equilibrium_thermochemical_data>`), using the corrected Akiake Information Criterion (AICc) to prevent overfitting.

Optimization and uncertainty quantification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ESPEI can optimize and quantify the uncertainty of CALPHAD model parameters to thermochemical and :ref:`phase boundary data <phase_boundary_data>`.
ESPEI can optimize and quantify the uncertainty of Calphad model parameters to thermochemical and :ref:`phase boundary data <phase_boundary_data>`.
Optimization and uncertainty quantification is performed using a Bayesian ensemble Markov Chain Monte Carlo (MCMC) method.
Any CALPHAD database can be used, including databases generated by ESPEI or starting from an existing CALPHAD database.
Any Calphad database can be used, including databases generated by ESPEI or starting from an existing Calphad database.

ESPEI supports all models supported by pycalphad.
User-developed models that are compatible with pycalphad can be used without making any modifications to ESPEI's code.
Expand All @@ -57,7 +57,7 @@ The name ESPEI and early concept were developed by [Shang2010]_ under the superv
After developing `pycalphad`_, Richard Otis and Zi-Kui Liu reimagined the concept and wrote
`pycalphad-fitting`_ (used in [Otis2016]_ and [Otis2017]_), which formed the nucleus for the present version of ESPEI ([Bocklund2019]_).

Details on the implementation of ESPEI can be found in the following publications:
Details on the implementation of ESPEI can be found in the following publications:

- B\. Bocklund *et al.*, MRS Communications 9(2) (2019) 1–10. doi:`10.1557/mrc.2019.59 <https://doi.org/10.1557/mrc.2019.59>`_.
- B\. Bocklund, Ph.D. Dissertation (Chapter 3), The Pennsylvania State University (2021), https://etda.libraries.psu.edu/catalog/21192bjb54
Expand Down Expand Up @@ -122,6 +122,7 @@ Documentation


cu-mg-example
tutorial_gen_custom_mod_params

.. raw:: latex

Expand Down Expand Up @@ -222,6 +223,7 @@ References
.. [Coughanowr1991] Coughanowr *et al.*, Assessment of the Cu-Mg system. Zeitschrift f{ü}r Met. 82, 574–581 (1991).
.. [Dinsdale1991] Dinsdale, Calphad 15(4) (1991) 317-425, doi:`10.1016/0364-5916(91)90030-N <https://doi.org/10.1016/0364-5916(91)90030-N>`_
.. [Lukas2007] Lukas, Fries, and Sundman, Computational Thermodynamics: The Calphad Method. (Cambridge University Press, 2007). doi:`10.1017/CBO9780511804137 <https://doi.org/10.1017/CBO9780511804137>`_
.. [Marker2018] Marker *et al.*, Computational Materials Science 142 (2018) 215-226. doi:`10.1016/j.commatsci.2017.10.016 <https://doi.org/10.1016/j.commatsci.2017.10.016>`_
.. [Otis2016] Otis, Ph.D. Dissertation, The Pennsylvania State University (2016). https://etda.libraries.psu.edu/catalog/s1784k73d
.. [Otis2017] Otis *et al.*, JOM 69 (2017) doi:`10.1007/s11837-017-2318-6 <http://doi.org/10.1007/s11837-017-2318-6>`_
.. [Roslyakova2016] Roslyakova *et al.*, Calphad 55 (2016) 165–180. doi:`10.1016/j.calphad.2016.09.001 <https://doi.org/10.1016/j.calphad.2016.09.001>`_
Expand Down
6 changes: 3 additions & 3 deletions docs/input_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ To check the datasets at path ``my-input-data/`` you can run ``espei --check-dat
Phase Descriptions
==================

The JSON file for describing CALPHAD phases is conceptually similar to a setup file in Thermo-Calc's PARROT module.
The JSON file for describing Calphad phases is conceptually similar to a setup file in Thermo-Calc's PARROT module.
At the top of the file there is the ``refdata`` key that describes which reference state you would like to choose.
Currently the reference states are strings referring to dictionaries in ``pycalphad.refdata`` only ``"SGTE91"`` is implemented.

Expand Down Expand Up @@ -131,7 +131,7 @@ Two examples follow. The first dataset has some data for the formation heat capa
* The ``conditions`` describe temperatures (``T``) and pressures (``P``) as either scalars or one-dimensional lists.
* The type of quantity is expressed using the ``output`` key. This can in principle be any thermodynamic quantity, but currently only ``CPM*``, ``SM*``, and ``HM*`` (where ``*`` is either nothing, ``_MIX`` or ``_FORM``) are supported. Support for changing reference states is planned but not yet implemented, so all thermodynamic quantities must be formation quantities (e.g. ``HM_FORM`` or ``HM_MIX``, etc.). This is tracked by :issue:`85` on GitHub.
* ``values`` is a 3-dimensional array where each value is the ``output`` for a specific condition of pressure, temperature, and sublattice configurations from outside to inside. Alternatively, the size of the array must be ``(len(P), len(T), len(subl_config))``. In the example below, the shape of the ``values`` array is (1, 12, 1) as there is one pressure scalar, one sublattice configuration, and 12 temperatures.
* There is also a key, ``excluded_model_contributions``, which will make those contributions of pycalphad's ``Model`` not be fit to when doing parameter selection or MCMC. This is useful for cases where the type of data used does not include some specific ``Model`` contributions that parameters may already exist for. For example, DFT formation energies do not include ideal mixing or (CALPHAD-type) magnetic model contributions, but formation energies from experiments would include these contributions so experimental formation energies should not be excluded.
* There is also a key, ``excluded_model_contributions``, which will make those contributions of pycalphad's ``Model`` not be fit to when doing parameter selection or MCMC. This is useful for cases where the type of data used does not include some specific ``Model`` contributions that parameters may already exist for. For example, DFT formation energies do not include ideal mixing or (Calphad-type) magnetic model contributions, but formation energies from experiments would include these contributions so experimental formation energies should not be excluded.

.. code-block:: JSON
Expand Down Expand Up @@ -359,7 +359,7 @@ Tags are a flexible method to adjust many ESPEI datasets simultaneously and driv
Each dataset can have a ``"tags"`` key, with a corresponding value of a list of tags, e.g. ``["dft"]``.
Any tag modifications present in the input YAML file are applied to the datasets before ESPEI is run.

They can be used in many creative ways, but some suggested ways include to add weights or to exclude model contributions, e.g. for DFT data that should not have contributions for a CALPHAD magnetic model or ideal mixing energy.
They can be used in many creative ways, but some suggested ways include to add weights or to exclude model contributions, e.g. for DFT data that should not have contributions for a Calphad magnetic model or ideal mixing energy.
An example of using the tags in an input file looks like:

.. code-block:: JSON
Expand Down
4 changes: 2 additions & 2 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Quickstart
ESPEI has two different fitting modes: parameter generation and Bayesian parameter estimation, which uses Markov Chain Monte Carlo (MCMC).
You can run either of these modes or both of them sequentially.

To run either of the modes, you need to have a phase models file that describes the phases in the system using the standard CALPHAD approach within the compound energy formalism.
To run either of the modes, you need to have a phase models file that describes the phases in the system using the standard Calphad approach within the compound energy formalism.
You also need to describe the data that ESPEI should fit to.
You will need single-phase and multi-phase data for a full run.
Fit settings and all datasets are stored as JSON files and described in detail at the :ref:`Input data` page.
Expand Down Expand Up @@ -151,7 +151,7 @@ You can install git using ``conda install git`` on Windows.
Q: I have a large database, can I use ESPEI to optimize parameters in only a subsystem?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A: Yes, if you have a multicomponent CALPHAD database, but want to optimize or
A: Yes, if you have a multicomponent Calphad database, but want to optimize or
determine the uncertainty for a constituent unary, binary or ternary subsystem
that you have data for, you can do that without any extra effort.

Expand Down
2 changes: 1 addition & 1 deletion docs/recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ the diagonal and covariances between them under the diagonal. A more
circular covariance means that parameters are not correlated to each
other, while elongated shapes indicate that the two parameters are
correlated. Strongly correlated parameters are expected for some
parameters in CALPHAD models within phases or for phases in equilibrium,
parameters in Calphad models within phases or for phases in equilibrium,
because increasing one parameter while decreasing another would give a
similar likelihood.

Expand Down
6 changes: 3 additions & 3 deletions docs/specifying_priors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,18 @@ There is also a special (improper) ``zero`` prior that always gives :math:`\ln p
Each ``scipy.stats`` prior is typically specified using several keyword argument
parameters, e.g. ``loc`` and ``scale``, which have special meaning for the
different distribution functions.
In order to be flexible to specifying these arguments when the CALPHAD
In order to be flexible to specifying these arguments when the Calphad
parameters they will be used for are not known beforehand, ESPEI uses a small
language to specify how the distribution hyperparameters can be set relative to
the CALPHAD parameters.
the Calphad parameters.

Basically, the ``PriorSpec`` objects are created with the name of the distribution
and the hyperparameters that are modified with
one of the modifier types: ``absolute``, ``relative``, ``shift_absolute``, or ``shift_relative``.
For example, the ``loc`` parameter might become ``loc_relative`` and ``scale`` might
become ``scale_shift_relative``.

Here are some examples of how the modifier parameters of value ``v`` modify the hyperparameters when given a CALPHAD parameter of value ``p``:
Here are some examples of how the modifier parameters of value ``v`` modify the hyperparameters when given a Calphad parameter of value ``p``:

* ``_absolute=v`` always take the exact value passed in, ``v``; ``loc_absolute=-20`` gives a value of ``loc=-20``.
* ``_relative=v`` gives , ``v*p``; ``scale_absolute=0.1`` with ``p=10000`` gives a value of ``scale=10000*0.1=1000``.
Expand Down
Loading

0 comments on commit c3457ff

Please sign in to comment.