Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds balancing for reactions involving compounds #657

Closed
wants to merge 18 commits into from

Conversation

LalitChauhan56
Copy link
Member

Adds balancing code and tests for reactions involving compounds as discussed in #651.

@isaacsas @TorkelE

Copy link

@ai-maintainer ai-maintainer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-Maintainer Review for PR - Adds balancing for reactions involving compounds

Title and Description 👍

The Title and Description are clear and focused
The title and description of the pull request are clear and focused. They effectively communicate the purpose of the changes, which is to add balancing for reactions involving compounds. The reference to the discussion in issue #651 provides useful context.

Scope of Changes 👍

The changes are narrowly focused
The changes in the pull request are narrowly focused on adding balancing for reactions involving compounds. There is no indication that the author is trying to resolve multiple issues simultaneously, which is good practice.

Testing ⚠️

Testing details are missing
The description does not provide details on how the changes were tested. While it mentions that tests have been added, it would be helpful to include more information about the specific test cases used or any additional testing methodologies employed.

Documentation ⚠️

Docstrings are missing for several functions and structs
Several newly added functions and structs are missing docstrings. These should be added to describe their behavior, arguments, and return values. The following items need docstrings:
  • struct CompoundSpecies
  • struct CompoundComponents
  • struct CompoundCoefficients
  • macro compound
  • iscompound(s::Num)
  • coefficients(s::Num)
  • components(s::Num)
  • component_coefficients(s::Num)
  • create_matrix(reaction::Catalyst.Reaction)
  • get_stoich(reaction::Reaction)
  • balance(reaction::Reaction)

Suggested Changes

  • Please add docstrings to all new functions, classes, and methods to describe their behavior, arguments, and return values.
  • Please provide more information about how the changes were tested. This could include specific test cases used, any additional testing methodologies employed, or the results of the tests.

Reviewed with AI Maintainer

@isaacsas
Copy link
Member

isaacsas commented Aug 2, 2023

@LalitChauhan56 once we get the other merged and this updated to the merged master I'll give this a review. See my comments on the other PR. Thanks for splitting these up!

@LalitChauhan56
Copy link
Member Author

@isaacsas @TorkelE This PR is now up-to-date with master.

@TorkelE
Copy link
Member

TorkelE commented Aug 7, 2023

Having this in a file called "compound" seems like a poor choice of name. Made rename it "chemistry_functionality"?

@LalitChauhan56
Copy link
Member Author

@TorkelE I have changed the file name. Also, should I make a new file for the balancing code part (both in the source code and test files)?

@TorkelE
Copy link
Member

TorkelE commented Aug 8, 2023

No, that file can contain all the chemistry-related stuff.

@LalitChauhan56
Copy link
Member Author

Okay. Also, is the current approach for balancing okay or should I change anything?

@TorkelE
Copy link
Member

TorkelE commented Aug 8, 2023

Just checking, how many components on the left/right should the balancing be able to take? Is it limited to two species on each side?

E.g. should you be able to balance

H + O --> H2O

and get 2,1,1? Would it be possible to do

H2 + O2 + N2 + C --> H2O + NO2?

I will write a couple of additional tests for you to use, next step would be to ensure those are passed.

@TorkelE
Copy link
Member

TorkelE commented Aug 8, 2023

Some comments:

  • create_matrix and get_stoich have very generic names, but presumably they are meant to be used for balancing equation? Maybe rename them to something more narrow, suggesting what they are doing?
  • Could you add some comments for the create_matrix function to describe what it is doing. Maybe one before the function, and one before each of the two for loops?

Once you confirm whenever you can have any number of species types on each type I will write more tests for you.

@LalitChauhan56
Copy link
Member Author

Yes, the conditions you have said above work. Also, there is no limit on the number of reactants and products in the reaction.

@LalitChauhan56
Copy link
Member Author

LalitChauhan56 commented Aug 8, 2023

Yes, create_matrix and get_stoich are used for balancing the equation in the final function balance which gives a balanced equation as the output. Also, I will add comments at the specified places.

@TorkelE
Copy link
Member

TorkelE commented Aug 8, 2023

I'm pasting another 9 reactions here for running tests on. I haven't written out the tests, but I would in each case check that the balanced output is correct, and test that. In each case I have also written what I think would be the correct balanced reaction (but please double check if the algorithm doesn't;t seem to work!).

Several of these cases errors or gives some very strange output, so I would go through them one by one to check what is going on. Please update with any questions once you get them!

using Catalyst

@variables t
@species C(t) Ca(t) Fe(t) H(t) N(t) Na(t) O(t) P(t) S(t)

# @reaction k, 2H2O --> 2H2O
let 
    @compound H2O(t) 2H O
    r = Reaction(1.0, [H2O], [H2O], [2], [2])
    balance(r)
end

# @reaction k, 23H + 1O --> 7H2O    # Note: Definitely not balanced as given on this line!
let 
    @compound H2O(t) 2H O
    r = Reaction(1.0, [H, O], [H2O], [23, 1], [7])
    balance(r)
end

# @reaction k, 2CH4 + 3O2 --> 2CO2 + 4H2O
let 
    @compound CH4(t) 4H C
    @compound O2(t) 2O
    @compound CO2(t) C 2O
    @compound H2O(t) 2H O
    r = Reaction(1.0, [CH4, O2], [CO2, H2O])
    balance(r)
end

# @reaction k, N2 + 3H2 --> 2NH3
let
    @compound N2(t) 2N
    @compound H2(t) 2H
    @compound NH3(t) N 3H
    r = Reaction(1.0, [N2, H2], [NH3])
    balance(r)
end

# @reaction k, 2C2H5OH + CH3COOH --> 2C4H8O2 + 2H2O
let
    @compound C2H5OH(t) 2C 6H O
    @compound CH3COOH(t) 4C 4H 2O
    @compound C4H8O2(t) 4C 8H 2O
    @compound H2O(t) 2H O
    r = Reaction(1.0, [C2H5OH, CH3COOH], [C4H8O2, H2O])
    balance(r)
end

# @reaction k, 2Ca3PO42 --> 6CaO + 2P4O10
let    
    @compound Ca3PO42(t) 3Ca 2P 8O
    @compound CaO(t) Ca O
    @compound P4O10(t) 4P 10O
    r = Reaction(1.0, [Ca3PO42], [CaO, P4O10])
    balance(r)
end

# @reaction k, 4Fe + 3O2 + 6H2O --> 4FeOH3
let
    @compound O2(t) 2O 
    @compound H20(t) 2H O
    @compound FeOH3(t) Fe 3H O
    r = Reaction(1.0, [Fe, O2, H2O], [FeOH3])
    balance(r)
end

# @reaction k, 2NaOH + H2SO4 --> Na2SO4 + 2H2O
let
    @compound NaOH(t) Na O H
    @compound SO4(t) S 4O
    @compound H2SO4(t) 2H SO4
    @compound Na2SO4(t) 2Na SO4
    @compound H2O(t) 2H O
    r = Reaction(1.0, [NaOH,H2SO4], [Na2SO4,H2O], [], [])
    balance(r)
end

# @reaction k, 4NO2 --> 2N2O4
let
    @compound NO2(t) N 2O
    @compound N2O4(t) 2N 4O
    r = Reaction(1.0, [NO2], [N2O4])
    balance(r)
end

@TorkelE
Copy link
Member

TorkelE commented Aug 8, 2023

Also, maybe the balance function should be given a more specific name as well? Maybe balance_reaction?

Now if people import Catalyst, if they have anything named balance (which is not entirely out of the question) there will be a conflict, which would be unfortunate.

@LalitChauhan56
Copy link
Member Author

I tried running these, some of them give weird errors. The problem lies in the approach used within the get_stoich function. I somewhat expected it to give errors in some cases and have other approaches just in case (commented in the end of the file). I will figure out the most optimal approach and update the function to better handle such cases.

Also, the balance function expects a reaction with no stoichiometric arrays already attached to it.

For eg-

    r = Reaction(1.0, [H, O], [H2O], [23, 1], [7])   
    balance(r) #Expects reaction to be  r = Reaction(1.0, [H, O], [H2O])

Also, I'll rename the balance function to be balance_reaction

@TorkelE
Copy link
Member

TorkelE commented Aug 9, 2023

I added the one with the stochiometric array as a edge case, since most reactions will have them. It seems sensible (@isaacsas) to have it work on these as well, but simply ignore the stoichiometries (since these are not important for the balanced equation).

Let's start fixing the errors, also, try incorporating these as tests into the test files as well. Next step would be to make these tests and ensure that they pass.

@isaacsas
Copy link
Member

isaacsas commented Aug 9, 2023

Yes, this should balance a general reaction, irregardless of whether it has stoichiometry associated with it. You could always make the base version take vectors of substrates and products, and then have a dispatch that works for general reactions to pass through those vectors.

@LalitChauhan56
Copy link
Member Author

I fixed the errors and added the the above reactions as tests after correcting some of the commented balanced reactions (reactions 3,4,5).

The main issue was that even after dividing two rational number matrices we sometimes ended up with a float vector instead of a rational number vector and it has to do with how the computer handles division.
For eg - we would sometimes end up with 1.999999999999.... instead of 2
I modified the logic to handle this and also to handle the edge cases.

Also, I added comments to the create_matrix function and renamed balance to be balance_reaction. I would love if you could suggest some better names for the get_stoich and create_matrix functions.

@TorkelE
Copy link
Member

TorkelE commented Aug 9, 2023

balance_reaction_get_stoich
balance_reaction_create_matrix

@TorkelE
Copy link
Member

TorkelE commented Aug 9, 2023

For me, rounding 1.999999999999.... seems useful. Do you have any idea of how you could modify so that you don't have to deal with floats?

Good to see the test passing :)

@isaacsas
Copy link
Member

isaacsas commented Aug 9, 2023

balance_reaction_get_stoich
balance_reaction_create_matrix

Those are insanely long names... Why do we need to export these functions at all? Can't we just keep them internal and use the current names?

@isaacsas
Copy link
Member

isaacsas commented Aug 9, 2023

@LalitChauhan56 I'll take a look at this next week or this weekend -- I'm at a conference this week.

@LalitChauhan56
Copy link
Member Author

Do you have any idea of how you could modify so that you don't have to deal with floats?

To avoid floating point numbers I converted the numbers into rational numbers before division hoping that we would end up as a rational number as a result but in some cases we end up with a floating point number instead. I think this is due to how the computer handles division internally and we sometimes end-up with floating point numbers. However I think dividing the rational numbers actually has more accuracy than dividing two integers directly (i.e. rounding 1.999999999 to 2 should not have any effect). So I still have to use round() for such cases whereas normal cases without floats proceed without it. Also, in some cases the previous logic sometimes gave 0 as a stoichiometric coefficient for some cases such as r = Reaction(1.0, [NaOH,H2SO4], [Na2SO4,H2O]) so I have added logic to handle such cases as well.

@LalitChauhan56
Copy link
Member Author

LalitChauhan56 commented Aug 17, 2023

This approach uses rational numbers from the start till the end entirely avoiding float numbers and rounding. Also this ensures that the matrix is always square and solved using backward substitution method.

Also, get_stoich and create_matrix are no longer exported.

@TorkelE
Copy link
Member

TorkelE commented Aug 17, 2023

Looks good. I might separate the tests for the balancing reaction stuff into a separate test file, e.g. "test/miscellaneous_tests/reaction_balancing.jl" and then add a line to the runtests file:

@time @safetestset "Reaction balancing" begin include("miscellaneous_tests/reaction_balancing.jl") end

I think Sam will want to have a look at it as well before we merge, but I think you can start thinking about next steps now.

@LalitChauhan56
Copy link
Member Author

This separates the test file for the balancing part.

Also, as the next step I plan to further build the PubChem package to work in accordance with this and make use of the balancing functionality.

@isaacsas
Copy link
Member

@LalitChauhan56 you need to update this to master as it currently shows already merged code as new. Can you do that, then I'll take a look.

@LalitChauhan56
Copy link
Member Author

LalitChauhan56 commented Aug 18, 2023

I think it is because I changed the name of the file as suggested by @TorkelE

@isaacsas
Copy link
Member

Keep the old file name, update to master, then use git mv to rename the file.

@isaacsas
Copy link
Member

That should hopefully pick up the changes only.

Copy link
Member

@isaacsas isaacsas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you do any examples where m = n + k for k > 1?

Comment on lines 133 to 181
function get_stoich(reaction::Reaction)
# Create the matrix A using create_matrix function.
A = create_matrix(reaction)

m, n = size(A)
if m == n
B = zeros(Int64, size(A,1))
else
# Append a row with last element as 1
new_row = zeros(Int, 1, size(A, 2))
new_row[end] = 1
A = vcat(A, new_row)

B = zeros(Int64, size(A,1))
B[end] = 1
end

# Concatenate B to A to form an augmented matrix
AB = [A B]

# Apply the Bareiss algorithm
ModelingToolkit.bareiss!(AB)

# Extract the transformed A and B
A_transformed = AB[:, 1:end-1]
B_transformed = AB[:, end]

# Convert A_transformed to rational numbers
A_transformed_rational = Rational.(A_transformed)

# Convert B_transformed to rational numbers
B_transformed_rational = Rational.(B_transformed)

# Perform backward substitution
X = backward_substitution(A_transformed_rational,B_transformed_rational)

# Get the denominators of the rational numbers in X
denominators = denominator.(X)

# Compute the LCM of the denominators
lcm_value = reduce(lcm, denominators)

# Multiply each element in X by the LCM of the denominators
X_multiplied = X .* lcm_value

# Convert the rational numbers to integers
X_integers = numerator.(X_multiplied)

return abs.(X_integers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LalitChauhan56 do you have a reference for this approach? When do you only add one row when m != n? What if m = n + k with k > 1?

More generally, why not use the nullspace function from ModelingToolkit that we use for conservation laws, see

Catalyst.jl/src/networkapi.jl

Lines 1200 to 1241 in e91b001

function conservationlaws(nsm::T; col_order = nothing) where {T <: AbstractMatrix}
# compute the left nullspace over the integers
N = MT.nullspace(nsm'; col_order)
# if all coefficients for a conservation law are negative, make positive
for Nrow in eachcol(N)
all(r -> r <= 0, Nrow) && (Nrow .*= -1)
end
# check we haven't overflowed
iszero(N' * nsm) || error("Calculation of the conservation law matrix was inaccurate, "
* "likely due to numerical overflow. Please use a larger integer "
* "type like Int128 or BigInt for the net stoichiometry matrix.")
T(N')
end
function cache_conservationlaw_eqs!(rn::ReactionSystem, N::AbstractMatrix, col_order)
nullity = size(N, 1)
r = numspecies(rn) - nullity # rank of the netstoichmat
sts = species(rn)
indepidxs = col_order[begin:r]
indepspecs = sts[indepidxs]
depidxs = col_order[(r + 1):end]
depspecs = sts[depidxs]
constants = MT.unwrap.(MT.scalarize((@parameters Γ[1:nullity])[1]))
conservedeqs = Equation[]
constantdefs = Equation[]
for (i, depidx) in enumerate(depidxs)
scaleby = (N[i, depidx] != 1) ? N[i, depidx] : one(eltype(N))
(scaleby != 0) || error("Error, found a zero in the conservation law matrix where "
*
"one was not expected.")
coefs = @view N[i, indepidxs]
terms = sum(p -> p[1] / scaleby * p[2], zip(coefs, indepspecs))
eq = depspecs[i] ~ constants[i] - terms
push!(conservedeqs, eq)
eq = constants[i] ~ depspecs[i] + terms
push!(constantdefs, eq)
end
to get an integer nullspace basis?

@LalitChauhan56
Copy link
Member Author

LalitChauhan56 commented Aug 22, 2023

We end up 3 types of cases when we are building the matrix before solving -

  1. We have equal number of equations and number of variables (m == n)
  2. We have more number of equations than the number of variables (m > n)
  3. We have more number of variables than the number of equations (m < n)

In the first case it is a straightforward process to solve for X. (using the Bareiss algorithm)

In the second case we slice the matrix containing the equations so that the number of equations is equal to the number of variables and solve it similar to the first case.

In the third case, where we have more variables than the number of equations we define an auxiliary equation
by arbitrarily choosing a value for one of the coefficients ( assuming it to be 1) and solve the system similar to the first case.
Also, assuming the value of one coefficient seems to be enough in solving all systems so adding one row seems to do the job. I tried it with multiple reactions and also added those as tests. Tests now contain all the 3 cases mentioned above.
Also, regarding the use of nullspace, I don't think we can use it in the third case because the we assume the equation to be in the form x = 1 so the b vector in Ax= b is not 0. (Correct me if I am wrong).

Also, please let me know if I am missing in case where adding one row is not enough (I couldn't find any).
Reference : http://mathgene.usc.es/matlab-profs-quimica/reacciones.pdf (page 1, last paragraph)

@isaacsas
Copy link
Member

@LalitChauhan56 take a look at the second equation on http://logical.ai/chemistry/html/chem-nullspace.html. It is a case where there are two more columns then rows, but there should be a solution to balancing the reaction (in fact there are an infinite number of solutions, but I think we'd just want to pick one).

@isaacsas
Copy link
Member

Also, I don't understand your comment about Ax = b when using nullspace. The point is there is no linear system to solve when using nullspace since it returns the basis vectors for the nullspace of the matrix directly. Then, can't we just use one of them as the solution (assuming there is a non-trivial solution)?

@LalitChauhan56
Copy link
Member Author

LalitChauhan56 commented Aug 29, 2023

Yes, I think using nullspace makes sense. I got confused about the Ax = b part because of the previous approach. I have implemented the nullspace approach now. But in the cases where there are infinite solutions how do we select the appropriate set of coefficients? (sometimes the set contains 0 for a product/reactant). Does that mean that the corresponding species does not participate in that specific pathway?

For example for the reaction CO + CO2 + H2 --> CH4 + H2O the nullspace is

5×2 Matrix{Int64}:
 -4   2
  2  -2
 -4  -2
 -2   0
  0  -2

How do we select which set to use? Do we return all the possible reactions? (or maybe the first 3 in cases where we have multiple possibilities? I'm not sure)

@LalitChauhan56
Copy link
Member Author

Sorry that was a misclick.

@isaacsas
Copy link
Member

Maybe it would make sense to return a vector of Reactions when the nullspace dimension is greater than 1 (one for each basis element), along with printing a message that there are multiple independent solutions?

@isaacsas
Copy link
Member

isaacsas commented Sep 1, 2023

Finishing in #667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants