Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve conservation law handling in find_identifiable_functions #984

Merged
merged 5 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 12 additions & 23 deletions docs/src/inverse_problems/structural_identifiability.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ StructuralIdentifiability currently assesses identifiability for the first two o
!!! note
Currently, the StructuralIdentifiability.jl extension only considers structural identifiability for the ODE generated by the reaction rate equation. It is possible that for the SDE model (generated by the chemical Langevin equation) and the jump model (generated by stochastic chemical kinetics) the identifiability of model quantities is different.

## Global identifiability analysis
## [Global identifiability analysis](@id structural_identifiability_gi)

### Basic example
### [Basic example](@id structural_identifiability_gi_example)

Global identifiability can be assessed using the `assess_identifiability` function. For each model quantity (parameters and variables), it will assess whether they are:
- Globally identifiable.
Expand All @@ -48,7 +48,7 @@ assess_identifiability(gwo; measured_quantities = [:M, :P, :E], loglevel = Loggi
```
in which case all species trajectories and parameters become identifiable.

### Indicating known parameters
### [Indicating known parameters](@id structural_identifiability_gi_known_ps)
In the previous case we assumed that all parameters are unknown, however, this is not necessarily true. If there are parameters with known values, we can supply these using the `known_p` argument. Providing this additional information might also make other, previously unidentifiable, parameters identifiable. Let us consider the previous example, where we measure the concentration of $M$ only, but now assume we also know the production rate of $E$ ($pₑ$):
```@example si1
assess_identifiability(gwo; measured_quantities = [:M], known_p = [:pₑ], loglevel = Logging.Error)
Expand All @@ -57,7 +57,7 @@ Not only does this turn the previously non-identifiable `pₑ` (globally) identi

To, in a similar manner, indicate that certain initial conditions are known is a work in progress. Hopefully this feature should be an available in the near future.

### Providing non-trivial measured quantities
### [Providing non-trivial measured quantities](@id structural_identifiability_gi_nontrivial_mq)
Sometimes, ones may not have measurements of species, but rather some combinations of species (or possibly parameters). To account for this, `measured_quantities` accepts any algebraic expression (and not just single species). To form such expressions, species and parameters have to first be `@unpack`'ed from the model. Say that we have a model where an enzyme ($E$) is converted between an active and inactive form, which in turns activates the production of a product, $P$:
```@example si1
rs = @reaction_network begin
Expand All @@ -72,30 +72,30 @@ assess_identifiability(rs; measured_quantities = [Eᵢ + Eₐ, :P], loglevel = L
nothing # hide
```

### Assessing identifiability for specified quantities only
### [Assessing identifiability for specified quantities only](@id structural_identifiability_gi_ftc)
By default, StructuralIdentifiability assesses identifiability for all parameters and variables. It is, however, possible to designate precisely which quantities you want to check using the `funcs_to_check` option. This both includes selecting a smaller subset of parameters and variables to check, or defining customised expressions. Let us consider the Goodwind from previously, and say that we would like to check whether the production parameters ($pₘ$, $pₑ$, and $pₚ$) and the total amount of the three species ($P + M + E$) are identifiable quantities. Here, we would first unpack these (allowing us to form algebraic expressions) and then use the following code:
```@example si1
@unpack pₘ, pₑ, pₚ, M, E, P = gwo
assess_identifiability(gwo; measured_quantities = [:M], funcs_to_check = [pₘ, pₑ, pₚ, M + E + P], loglevel = Logging.Error)
nothing # hide
```

### Probability of correctness
### [Probability of correctness](@id structural_identifiability_gi_probs)
The identifiability methods used can, in theory, produce erroneous results. However, it is possible to adjust the lower bound for the probability of correctness using the argument `prob_threshold` (by default set to `0.99`, that is, at least a $99\%$ chance of correctness). We can e.g. increase the bound through:
```@example si1
assess_identifiability(gwo; measured_quantities=[:M], prob_threshold = 0.999, loglevel = Logging.Error)
nothing # hide
```
giving a minimum bound of $99.9\%$ chance of correctness. In practise, the bounds used by StructuralIdentifiability are very conservative, which means that while the minimum guaranteed probability of correctness in the default case is $99\%$, in practise it is much higher. While increasing the value of `prob_threshold` increases the certainty of correctness, it will also increase the time required to assess identifiability.

## Local identifiability analysis
## [Local identifiability analysis](@id structural_identifiability_lit)
Local identifiability can be assessed through the `assess_local_identifiability` function. While this is already determined by `assess_identifiability`, assessing local identifiability only has the advantage that it is easier to compute. Hence, there might be models where global identifiability analysis fails (or takes a prohibitively long time), where instead `assess_local_identifiability` can be used. This function takes the same inputs as `assess_identifiability` and returns, for each quantity, `true` if it is locally identifiable (or `false` if it is not). Here, for the Goodwind oscillator, we assesses it for local identifiability only:
```@example si1
assess_local_identifiability(gwo; measured_quantities = [:M], loglevel = Logging.Error)
```
We note that the results are consistent with those produced by `assess_identifiability` (with globally or locally identifiable quantities here all being assessed as at least locally identifiable).

## Finding identifiable functions
## [Finding identifiable functions](@id structural_identifiability_identifiabile_funcs)
Finally, StructuralIdentifiability provides the `find_identifiable_functions` function. Rather than determining the identifiability of each parameter and unknown of the model, it finds a set of identifiable functions, such as any other identifiable expression of the model can be generated by these. Let us again consider the Goodwind oscillator, using the `find_identifiable_functions` function we find that identifiability can be reduced to five globally identifiable expressions:
```@example si1
find_identifiable_functions(gwo; measured_quantities = [:M], loglevel = Logging.Error)
Expand All @@ -104,7 +104,7 @@ Again, these results are consistent with those produced by `assess_identifiabili

`find_identifiable_functions` tries to simplify its output functions to create nice expressions. The degree to which it does this can be adjusted using the `simplify` keywords. Using the `:weak`, `:standard` (default), and `:strong` arguments, increased simplification can be forced (at the expense of longer runtime).

## Creating StructuralIdentifiability compatible ODE models from Catalyst `ReactionSystem`s
## [Creating StructuralIdentifiability compatible ODE models from Catalyst `ReactionSystem`s](@id structural_identifiability_si_odes)
While the functionality described above covers the vast majority of analysis that user might want to perform, the StructuralIdentifiability package supports several additional features. While these does not have inherent Catalyst support, we do provide the `make_si_ode` function to simplify their use. Similar to the previous functions, it takes a `ReactionSystem`, lists of measured quantities, and known parameter values. The output is a [ODE of the standard form supported by StructuralIdentifiability](https://docs.sciml.ai/StructuralIdentifiability/stable/tutorials/creating_ode/#Defining-the-model-using-@ODEmodel-macro). It can be created using the following syntax:
```@example si1
si_ode = make_si_ode(gwo; measured_quantities = [:M])
Expand All @@ -116,28 +116,17 @@ print_for_DAISY(si_ode)
nothing # hide
```

## Notes on systems with conservation laws
## [Notes on systems with conservation laws](@id structural_identifiability_conslaws)
Several reaction network models, such as
```@example si3
using Catalyst, Logging, StructuralIdentifiability # hide
rs = @reaction_network begin
(k1,k2), X1 <--> X2
end
```
contain conservation laws (in this case $Γ = X1 + X2$, where $Γ = X1(0) + X2(0)$ is a constant). Because the presence of such conservation laws makes structural identifiability analysis prohibitively computationally expensive (for all but the simplest of cases), these are automatically eliminated by Catalyst (removing one ODE from the resulting ODE system for each conservation law). For the `assess_identifiability` and `assess_local_identifiability` functions, this will be unnoticed by the user. However, for the `find_identifiable_functions` and `make_si_ode` functions, this may result in one, or several, parameters of the form `Γ[i]` (where `i` is an integer) appearing in the produced expressions. These correspond to the conservation law constants and can be found through
```@example si3
conservedequations(rs)
```
E.g. if you run:
```@example si3
find_identifiable_functions(rs; measured_quantities = [:X1, :X2])
```
we see that `Γ[1]` (`= X1(0) + X2(0)`) is detected as an identifiable expression. If we want to disable this feature for any function, we can use the `remove_conserved = false` option:
```@example si3
find_identifiable_functions(rs; measured_quantities = [:X1, :X2], remove_conserved = false)
```
contain [conservation laws](@ref network_analysis_deficiency) (in this case $Γ = X1 + X2$, where $Γ = X1(0) + X2(0)$ is a constant). Because the presence of such conservation laws makes structural identifiability analysis prohibitively computationally expensive (for all but the simplest of cases), these are automatically eliminated by Catalyst. This is handled internally, and should not be noticeable to the user. The exception is the `make_si_ode` function. For each conservation law, its output will have one ODE removed, and instead have a conservation parameter (of the form `Γ[i]`) added to its equations. This feature can be disabled through the `remove_conserved = false` option.

## Systems with exponent parameters
## [Systems with exponent parameters](@id structural_identifiability_exp_params)
Structural identifiability cannot currently be applied to systems with parameters (or species) in exponents. E.g. this
```julia
rn = @reaction_network begin
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ function Catalyst.make_si_ode(rs::ReactionSystem; measured_quantities = [], know
ignore_no_measured_warn = false, remove_conserved = true)
# Creates a MTK ODESystem, and a list of measured quantities (there are equations).
# Gives these to SI to create an SI ode model of its preferred form.
osys, conseqs, _ = make_osys(rs; remove_conserved)
osys, conseqs, _, _ = make_osys(rs; remove_conserved)
measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
conseqs; ignore_no_measured_warn)
return SI.mtk_to_si(osys, measured_quantities)[1]
Expand Down Expand Up @@ -67,15 +67,15 @@ function SI.assess_local_identifiability(rs::ReactionSystem, args...;
measured_quantities = [], known_p = [], funcs_to_check = Vector(),
remove_conserved = true, ignore_no_measured_warn = false, kwargs...)
# Creates a ODESystem, list of measured quantities, and functions to check, of SI's preferred form.
osys, conseqs, vars = make_osys(rs; remove_conserved)
osys, conseqs, consconsts, vars = make_osys(rs; remove_conserved)
measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
conseqs; ignore_no_measured_warn)
funcs_to_check = make_ftc(funcs_to_check, conseqs, vars)

# Computes identifiability and converts it to a easy to read form.
out = SI.assess_local_identifiability(osys, args...; measured_quantities,
funcs_to_check, kwargs...)
return make_output(out, funcs_to_check, reverse.(conseqs))
return make_output(out, funcs_to_check, consconsts)
end

"""
Expand Down Expand Up @@ -107,15 +107,15 @@ function SI.assess_identifiability(rs::ReactionSystem, args...;
measured_quantities = [], known_p = [], funcs_to_check = Vector(),
remove_conserved = true, ignore_no_measured_warn = false, kwargs...)
# Creates a ODESystem, list of measured quantities, and functions to check, of SI's preferred form.
osys, conseqs, vars = make_osys(rs; remove_conserved)
osys, conseqs, consconsts, vars = make_osys(rs; remove_conserved)
measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
conseqs; ignore_no_measured_warn)
funcs_to_check = make_ftc(funcs_to_check, conseqs, vars)

# Computes identifiability and converts it to a easy to read form.
out = SI.assess_identifiability(osys, args...; measured_quantities,
funcs_to_check, kwargs...)
return make_output(out, funcs_to_check, reverse.(conseqs))
return make_output(out, funcs_to_check, consconsts)
end

"""
Expand Down Expand Up @@ -147,13 +147,13 @@ function SI.find_identifiable_functions(rs::ReactionSystem, args...;
measured_quantities = [], known_p = [], remove_conserved = true,
ignore_no_measured_warn = false, kwargs...)
# Creates a ODESystem, and list of measured quantities, of SI's preferred form.
osys, conseqs = make_osys(rs; remove_conserved)
osys, conseqs, consconsts, _ = make_osys(rs; remove_conserved)
measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
conseqs; ignore_no_measured_warn)

# Computes identifiable functions and converts it to a easy to read form.
out = SI.find_identifiable_functions(osys, args...; measured_quantities, kwargs...)
return vector_subs(out, reverse.(conseqs))
return vector_subs(out, consconsts)
end

### Helper Functions ###
Expand All @@ -172,22 +172,24 @@ function make_osys(rs::ReactionSystem; remove_conserved = true)

# Computes equations for system conservation laws.
# If there are no conserved equations, the `conseqs` variable must still have the `Vector{Pair{Any, Any}}` type.
if !remove_conserved
conseqs = Vector{Pair{Any, Any}}[]
else
if remove_conserved
conseqs = [ceq.lhs => ceq.rhs for ceq in conservedequations(rs)]
consconsts = [cconst.lhs => cconst.rhs for cconst in conservationlaw_constants(rs)]
isempty(conseqs) && (conseqs = Vector{Pair{Any, Any}}[])
isempty(consconsts) && (consconsts = Vector{Pair{Any, Any}}[])
else
conseqs = Vector{Pair{Any, Any}}[]
consconsts = Vector{Pair{Any, Any}}[]
end

return osys, conseqs, vars
return osys, conseqs, consconsts, vars
end

# Creates a list of measured quantities of a form that SI can read.
# Each measured quantity must have a form like:
# `obs_var ~ X` # (Here, `obs_var` is a variable, and X is whatever we can measure).
function make_measured_quantities(
rs::ReactionSystem, measured_quantities::Vector{T}, known_p::Vector{S},
conseqs; ignore_no_measured_warn = false) where {T, S}
function make_measured_quantities(rs::ReactionSystem, measured_quantities::Vector{T},
known_p::Vector{S}, conseqs; ignore_no_measured_warn = false) where {T, S}
# Warning if the user didn't give any measured quantities.
if !ignore_no_measured_warn && isempty(measured_quantities)
@warn "No measured quantity provided to the `measured_quantities` argument, any further identifiability analysis will likely fail. You can disable this warning by setting `ignore_no_measured_warn = true`."
Expand Down Expand Up @@ -218,9 +220,9 @@ end
# Processes the outputs to a better form.
# Replaces conservation law equations back in the output (so that e.g. Γ are not displayed).
# Sorts the output according to their input order (defaults to the `[unknowns; parameters]` order).
function make_output(out, funcs_to_check, conseqs)
funcs_to_check = vector_subs(funcs_to_check, conseqs)
out = OrderedDict(zip(vector_subs(keys(out), conseqs), values(out)))
function make_output(out, funcs_to_check, consconsts)
funcs_to_check = vector_subs(funcs_to_check, consconsts)
out = OrderedDict(zip(vector_subs(keys(out), consconsts), values(out)))
sortdict = Dict(ftc => i for (i, ftc) in enumerate(funcs_to_check))
return sort!(out; by = x -> sortdict[x])
end
Expand Down
Loading