diff --git a/docs/micom/talk.md b/docs/micom/talk.md index fb385f5..0dce263 100644 --- a/docs/micom/talk.md +++ b/docs/micom/talk.md @@ -69,7 +69,7 @@ Rodriguez-Palacios et al. https://doi.org/10.3389/fmed.2020.00009 Note: -Clostridioides difficile or C. diff is a leading cause of morbidity and mortality globally. In the US alone CDI affects nearly 500k individuals annually, ~6% of which are fatal +Clostridioides difficile or C. diff is a leading cause of morbidity and mortality globally. In the US alone CDI affects nearly 500k individuals annually, ~6% of which are fatal. As you can see in this graffic CDI disproportionately affects the US, Australia, parts of Europe and parts of Africa. --- @@ -87,11 +87,8 @@ Crobach, M. et al. https://doi.org/10.1128/cmr.00021-17 Note: -In adults estimates suggest 4-15% may be asymptomatic carriers -In infants the rate of colonization is much higher ~30% -Infection develops following disruptions to the microbiome -C. diff spores allow it to persist following disruptions -Without competition C. diff can bloom, leading to CDI +While CDI is a serious issue, C. diff isn't a bad guy in every context. In adults estimates suggest 4-15% may be asymptomatic carriers +of C. diff, while in infants the rate of colonization is much higher at around 30%. Infection develops following disruptions to the microbiome, such as antibiotic use and diarrhea. In these context C. diff spores allow it to persist following disruption and without competition C. diff can bloom, leading to CDI. --- @@ -102,10 +99,7 @@ Without competition C. diff can bloom, leading to CDI Note: -To predict C. diff engraftment with MICOM we'll be using the following workflow. -We'll inject all our samples with 10% C. diff and peform ctFBA to estimate growth rates and fluxes. -We'll then use this information to asses the susceptibility of our samples to invasion and see what -niches C. diff might be occupying when it invades. +We want to be able to predict C. diff engraftment with MICOM. To do so we'll be using the following workflow. As a reminder, we'll be working with samples from healthy FMT donors and individuals who had CDI and recieved an FMT. We'll import the abundance tables generated in yesterdays tutorial and inject all them with 10% C. diff. A sort of in-silico invasion assay. We'll then estimate growth rates and fluxes and use this information to asses the susceptibility of our samples to invasion. We'll also use the flux predictions to understand what niches C. diff might be occupying when it invades. --- @@ -137,7 +131,7 @@ Gene content yields metabolic *capacity* or *potential*. Note: -Yesterday, we used QIIME2 to analyze sequencing data, starting from raw data and using that to examine diversity metrics and map to taxonomy, among other methods introduced by Christian. Today we'll look at functional analyses of the gut microbiome. That is, rather than looking at what populates the microbiome, we'll be determining what is happening in the microbiome, which is not always fully explained by taxonomy alone. When we first say functional analysis, a few things might jump into mind, like some of the 'omics methods listed here. Most commonly, to determine the function of the microbiome, one will do a metagenomic study, from which you can see what genes are present in the community. Mapping these genes to a reference database is simple and from there we could infer what reactions might take place in the system, and put together a network of reactions taking place. There is a caveat, though, in that gene abundance data doesn't always tell the full story. That's because gene abundance really only reports the metabolic capacity of a community. If you imagine a microbial community with really high abundance of starch degradation genes, you might infer that a lot of starch degradation pathways are active. But if the host doesn't eat any starch, then clearly this inference is wrong. Gene abudances tell you what *could* happen, not necessarily what is happening. +Yesterday, we used QIIME2 to analyze sequencing data, starting from raw sequences and ultimately generating a table of amplicon sequence variants or ASVs. We also assigned taxonomy to our ASVs and used diversity metrics to compare samples. Today we'll look at functional analyses of the gut microbiome. That is, rather than looking at what populates the microbiome, we'll be determining what is happening in the microbiome, which is not always fully explained by taxonomy alone. When we first say functional analysis, a few things might jump into mind, like some of the 'omics methods listed here. Most commonly, to determine the function of the microbiome, one will do a metagenomic study, from which you can see what genes are present in the community. Mapping these genes to a reference database is simple and from there we could infer what reactions might take place in the system, and put together a network of reactions taking place. There is a caveat, though, in that gene abundance data doesn't always tell the full story. That's because gene abundance really only reports the metabolic capacity of a community. If you imagine a microbial community with really high abundance of starch degradation genes, you might infer that a lot of starch degradation pathways are active. But if the host doesn't eat any starch, then clearly this inference is wrong. Gene abudances tell you what *could* happen, not necessarily what is happening. --- @@ -173,7 +167,7 @@ video courtesy of [S. Nayyak](https://twitter.com/Na_y_ak) and [J. Iwasa](https: Note: -Great, so what are fluxes? Rather than abundances, which are a measure of concentration, fluxes are measures of mass conversion, into the system and through internal reactions. Since they represent a rate, rather than a concentration, fluxes are measured in units of concentration per unit time, for instance mmol per hour. In bacterial communities, we typical scale flux by relative abundance, to account for differential metabolic contributions by more- or less-abundant taxa. So why am I claiming that fluxes are more informative than abudances? Well if we look at the example on the right hand side, which shows glucose going into glycolysis and into the TCA cycle, we see that the cell in question is importing a ton of glucose, but that glucose is almost immediately being converted into pyruvate. If we were to measure interal glucose levels, they might be super low. This might lead us to incorrectly assume that glucose isn't required for this cell to grow. This doesn't tell the whole story, though, since glucose is obviously very important. What would be more informative, would be to see the flux of glucose into the cell, and the resulting pathways that are activated and metabolites produced. Clearly, we can learn a lot from looking at fluxes in a system like this. The problem with measuring fluxes, however, is that it is laborious and quite costly. Longitudinal metabolomics can work, in which we measure metabolic abundances across time points, and calculate fluxes between those timepoints, but this takes time, effort and expense and is limited to those metabolites that are included in the metabolic panel. Isotopic labeling is another option, wherein we feed the community for instance glucose with isotopically labeled carbon, and see how many of those carbons end up as pyruvate but that can be even more costly and is still limited to just those metabolites being investigated. So what we need is a way to easily and accurately estimate fluxes, without the labor and expense associated with these methods. Luckily, there is already a computational tool for that. +Great, so what are fluxes? Rather than abundances, which are a measure of concentration, fluxes are measures of mass conversion, into the system and through internal reactions. Since they represent a rate, rather than a concentration, fluxes are measured in units of concentration per unit time, for instance mmol per hour. In bacterial communities, we typically scale flux by relative abundance, to account for differential metabolic contributions by more- or less-abundant taxa. So why am I claiming that fluxes are more informative than abudances? Well if we look at the example on the right hand side, which shows glucose going into glycolysis and the TCA cycle, we see that the cell in question is importing a ton of glucose, but that glucose is almost immediately being converted into pyruvate. If we were to measure interal glucose levels, they might be super low. This might lead us to incorrectly assume that glucose isn't required for this cell to grow. Obviously this doesn't tell the whole story, though, since glucose is clearly being utilzied. What would be more informative, would be to see the flux of glucose into the cell, and the resulting pathways that are activated and metabolites produced. Clearly, we can learn a lot from looking at fluxes in a system like this. The problem with measuring fluxes, however, is that it is laborious and quite costly. Longitudinal metabolomics can work, in which we measure metabolic abundances across time points, and calculate fluxes between those timepoints, but this takes time, effort and expense and is limited to those metabolites that are included in the metabolic panel. It also often requires growth in a defined medium. Isotopic labeling is another option, wherein we feed the community isotopically labeled substrate using natural isotobes of carbon or nitrogen, and look for the appearance of those isotobes in other cellular metabolites. However, that can be even more costly and is still limited to just those metabolites being investigated. So what we need is a way to easily and accurately estimate fluxes, without the labor and expense associated with these methods. Luckily, there is already a computational tool for that. --- @@ -196,7 +190,7 @@ To do just that, we can use a method called flux balance analysis, which has bec Note: -Let's take a look at how FBA works and how we can use that to infer these fluxes. FBA is a powerful computational tool, that allows us to infer fluxes through reactions in a system by reducing the possible solution space of fluxes to a biologically relevant one. FBA makes the critical assumption that the system being modeled, in this case the metabolism of a microbe, is in a state of constant flow, or steady state. This just means that the inflow and outflow of reactions in the systems are equal. A good analogy to this is opening a window in your house: you'll have air flowing in, and air flowing out, but you won't suddnely have more flowing in than out, suddenly your house expands and pops like a balloon because more is coming in than going out. That doesn't happen. This steady state assumption mimicks this, and is applicable in several biological phenomena. If we look at our example reaction here, we have hydrogen and oxygen combining to form water. In steady state, we can say that stoichiometrically, the amount of hydrogen and oxygen consumed are equal to the amount of water produced. We can represent this mathematically with this first equation - reactants with negative coefficients and products with positive coeffients, balanced to zero. Repeating this for all the reactions in our system and factoring out the fluxes, we end up with a stoichiometric maxtrix designated "S" that represents all of the reactions, as well as an unknown vector of fluxes, designated "v", which we are solving for. To satisfy our assumption of constant flow, we set S dot v equal to zero, which sets up our system of equations. Additionally, we can put constraints on the individual reactions, seen here. For instance, we can set the lower bound of this flux to zero. This will force the production of water, thereby adding flux through the first equation. Solving our set of equations leaves us with a flux cone, the space in multidimensional flux space that includes all sets of fluxes satisfying our constraints. You might notice, however, that there are an infinite number of solutions in our flux cone that satisfy these constraints. So, whats next?? +Let's take a look at how FBA works and how we can use that to infer these fluxes. FBA is a powerful computational tool, that allows us to infer fluxes through reactions in a system by reducing the possible solution space of fluxes to a biologically relevant one. FBA makes the critical assumption that the system being modeled, in this case the metabolism of a microbe, is in a state of constant flow, or steady state. This just means that the inflow and outflow of reactions in the systems are equal. A good analogy to this is opening a window in your house: you'll have air flowing in, and air flowing out, but you won't suddenly have more flowing in than out. If you did then your house would expand and pop like a balloon due to the build up of pressure. That doesn't happen. The steady state assumption mimicks this, and is applicable in several biological phenomena. If we look at our example reaction here, we have hydrogen and oxygen combining to form water. In steady state, we can say that stoichiometrically, the amount of hydrogen and oxygen consumed are equal to the amount of water produced. We can represent this mathematically with this first equation - reactants with negative coefficients and products with positive coeffients, balanced to zero. Repeating this for all the reactions in our system and factoring out the fluxes, we end up with a stoichiometric maxtrix designated "S" that represents all of the reactions, as well as an unknown vector of fluxes, designated "v", which we are solving for. To satisfy our assumption of constant flow, we set S dot v equal to zero, which sets up our system of equations. Additionally, we can futher constrain the problem, given known limits for the fluxes. Solving our set of equations leaves us with a flux cone, the space in multidimensional flux space that includes all sets of fluxes satisfying our constraints. You might notice, however, that there are an infinite number of solutions in our flux cone that satisfy these constraints. So, whats next?? --- @@ -215,7 +209,7 @@ Well, the goal of FBA is to reduce that flux space to one that is biologically r Note: -To find that solution, we can look at growth rates. We know that bacteria can only be present in the system if they can grow, otherwise they would be excreted and no longer exist in the system. In practice, this means we can say we want the solution in that flux cone that corresponds with maximum biomass, since this is the biological goal of a growing microbe. Biomass production can also be represented as a flux, since it is in essence a measure of mass conversion. What's neat about flux balance analysis is that we can add information from a number of sources in order to determine our optimal solution. As we've mentioned, we can use the reactions present in the system to build a stoichiometric matrix, mathematically representing the reactions taking place and how they interact with each other, for instance precursors from one reaction required in another. We can actually infer this whole stoichiometric maxtrix from the the genome of a microbe, by looking at gene abundances to determine what reactions are potentially taking place in the system. Also, as we've mentioned, we make the assumption the system is at steady state. In practice, this means the bacteria is in the exponential growth phase, as the growth rate is constant and biomass production is constant, so we include this assumption to arrive at a solution. Reversibility of reactions can be determined by thermodynamics, as many reactions can only occur in one direction. Finally, we can also add constraints to the import of metabolites going into a system, which we can leverage to represent the metabolites available to the system to grow. in practice, we can use this to model effects of different diets on the microbiome, adding relevance to the predictions being made by FBA. Adding these constraints in and solving for our fluxes, we can find the region in the flux cone that represents maximum biomass, shown here in red. +To find that solution, we can look at growth rates. We know that bacteria can only be present in the system if they can grow, otherwise they would be excreted and no longer exist in the system. In practice, this means we can say we want the solution in the flux cone that corresponds with maximum biomass, since this is the generally the biological goal of a growing microbe. Biomass production can also be represented as a flux, since it is in essence a measure of mass conversion. What's great about flux balance analysis is that we can add information from a number of sources to constrain the problem and determine our optimal solution. As we've mentioned, we can use the reactions present in the system to build a stoichiometric matrix, mathematically representing the reactions taking place and how they interact with each other. We can actually infer this whole stoichiometric maxtrix from the the genome of a microbe, by looking at gene abundances to determine what reactions are potentially taking place in the system. Also, as we've mentioned, we make the assumption the system is at steady state. In practice, this means the bacteria is in the exponential growth phase, as the growth rate is constant and biomass production is constant, so we include this assumption to arrive at a possible solution. Additionally reversibility of reactions can be determined by thermodynamics, as many reactions can only occur in one direction thus limiting the space of possible fluxes. Finally, we can add constraints to the import of metabolites going into a system, which we can leverage to represent the metabolites available to the system to grow. In practice, we can use this to model effects of different diets on the microbiome, adding relevance to the predictions being made by FBA. Cumulativley these constraints help us find the region in the flux cone that represents maximum biomass, shown here in red. --- @@ -246,7 +240,7 @@ https://micom-dev.github.io/micom Note: -To this point, we've focused on making genome scale metabolic models. To make metabolic models of the microbiome community, or metagenome scale metabolic models, we use a tool called MICOM, which extends flux balance analysis into microbial communities. To initially build the models, we need to pass in the relative abundance of bacteria in the sample. Due to sequencing efficiency differences, there may be some bias toward some bacteria over others, but the abundance from sequencing data should be more or less representative of the community. MICOM will then map this abundance data to a database containing hundreds genome-scale metabolic models of common gut microbes. MICOM then uses the reconstructions of those taxa present in the sample to build a massive stoichiometric matrix like the one we discussed earlier that includes not only the internal reactions within each taxon, but also the exchanges between them and external reactions, which in this case is the host. We can then specify particular diets that are representative of the food being eaten by the subject of the model. It will then use FBA with an additional regularization step to calculate unique growth rates for all the bacteria based on the process we've outlined previously, and then estimate all the fluxes in the system based on those growth rates, returning a most likely flux distribution. This give us a huge amount of output data, which we can then use to make detailed and interesting predictions about imports, exports, the inner machinations of bacterial reaction networks, co-dependcies between bacteria, and it provides a testing ground for potential interventions. +To this point, we've focused on making genome scale metabolic models. To make metabolic models of the microbiome community, or metagenome scale metabolic models, we use a tool called MICOM, which extends flux balance analysis into microbial communities. To initially build the models, we need to pass in the relative abundance of bacteria in the sample. Due to sequencing efficiency differences, there may be some bias toward some bacteria over others, but the abundance from sequencing data should be more or less representative of the community. MICOM will then map this abundance data to a database containing hundreds genome-scale metabolic models of common gut microbes. MICOM then uses the reconstructions of those taxa present in the sample to build a massive stoichiometric matrix like the one we discussed earlier that includes not only the internal reactions within each taxon, but also the exchanges between them and external reactions, which in this case is the host. We can then specify a diet that is representative of the food being eaten by the subject of the model and use FBA with an additional regularization step to calculate unique growth rates for all the bacteria based on the process we've outlined previously. Following this, estimates of all the fluxes in the system can be generated based on those growth rates, returning a most likely flux distribution. This give us a huge amount of output data, which we can then use to make detailed and interesting predictions about imports, exports, the inner machinations of bacterial reaction networks, co-dependcies between bacteria, and a testing ground for potential interventions. --- @@ -298,6 +292,10 @@ Harcombe et al. 2013, https://doi.org/10.1371/journal.pcbi.1003091 +Note: + +As noted previosly classical FBA works fairly well when predicting the growth rates of individual bacteria. Here we see the results of a studying that compared empirically measured fluxes with those predicted by FBA across a series of evolved E. coli strains. We see that the predictions agree well with the empirical estimates and the evolved strains are able to achieve growth rates that are 90-95% of the theoretical maximum. + --- ## Community-scale metabolic models - pretty rowdy @@ -311,6 +309,10 @@ Senne de Oliveira Lino et al. 2021, https://doi.org/10.1038/s41467-021-21844-7 +Note: + +. + --- ## Estimating community wide growth rates with cooperative trade off flux balance analysis (ctFBA)