Developed for 13C metabolic flux analysis (MFA), this tool processes mass isotopomer distribution (MID) data from mass spectrometry to accurately fit MID data and deduce metabolic activities. It includes functionality for MFA, sensitivity analysis, experimental data analysis, and visualization of results for publication.
The tool is built for Python 3.8 and requires the following packages:
Packages | Version has been tested |
---|---|
numpy |
1.22 |
scipy |
1.7 |
matplotlib |
3.6 |
tqdm |
4.64 |
pandas |
1.5.2 |
sklearn |
3.0 |
xlsxwriter |
3.0 |
numba (optional) |
0.56 |
Anaconda is recommended as it includes most of the necessary packages. The packages provided by Anaconda are also optimized for Intel CPUs, ensuring better performance.
Models utilized in this software are in scripts/model
folder.
The basic model (
base_model
) contains the base model utilized in algorithm development, analysis to data availability
and experimental data analysis for cultured cells.
The basic model with GLC and CIT buffers (
base_model_with_glc_tca_buffer
)
contains the model for analysis of in vivo infusion data from patients,
which is slightly different from base model in several reactions.
All 13C-isotope labeling data are in scripts/data
folder.
Infusion data from patients with renal, brain and lung cancer are from Faubert et
al, 2017 (renal_carcinoma/data.xlsx
)
and Courtney et al, 2018 (lung_tumor/data.xlsx
).
Labeling data from cultured cell line HCT-116 are from Reid et
al, 2018 (hct116_cultured_cell_line/13C-Glucose_tracing_Mike.xlsx
).
Data for the eight colon cancer cell lines are generated in this study (
colon_cancer_cell_line/data.xlsx
).
These raw data are loaded and converted to standard form for MFA.
For MFA results, please refer to the Results section.
Algorithm and solver utilized in this study are located in the scripts/src/core
folder.
The model
and data
folder include some class definition and corresponding processing functions. Specifically, EMU
algorithm is encoded in model/emu_analyzer_functions.py
.
Most optimizations are based on slsqp_solver
and slsqp_numba_solver
. As their names indicate,
the slsqp_numba_solver
is implemented based on numba
package for faster execution (roughly 50% time reduction).
However, the numba version has the memory leak problem in parallelized executions in Linux system. If running for long
time (longer than 50 hours), the normal version is recommended.
This script could also be executed as a raw Python project. Make sure Python 3.8 and all required packages are correctly installed. First switch to a target directory and download the source code:
git clone https://github.com/LocasaleLab/Automated-MFA-2023
Switch to the source direct, add PYTHONPATH environment and run the main.py
:
cd Automated-MFA-2023
export PYTHONPATH=$PYTHONPATH:`pwd`
python main.py
You could try multiple different arguments according to help information. For example:
python main.py computation experiments flux_analysis hct116_cultured_cell_line -t
This instruction means running a computation
, which is a flux_analysis
process of data in experiments
named hct116_cultured_cell_line
in test mode (-t
). This process typically completes in 30 minutes.
Detailed argument list will be explained below.
There are two different options under the main manu:
figure
: Option to generate figures in paper. This option must be executed after that analysis results are generated by thecomputation
option.computation
: Option to run most computations.
There are four different options under the computation
manu:
standard_name
: Option to output standard name of metabolites and reactionssimulation
: Option to generate simulated MID datasensitivity
: Option to analyze protocol, model, data and config sensitivity of MFAexperiments
: Option to run MFA for several experimental data analyses
This option will output the standard name of all metabolites and reactions to common_data/raw_data/standard_name.xlsx
.
This option will generate simulated MID data utilized in algorithm development and robustness analysis. Simulated data
Excel and pickle file will be output to the folder common_data/raw_data/simulated_data
and scripts/data/simulated_data
respectively. This option has following optional parameters:
-b, --batch_num n
:
This parameter is used to generated batched (determined by n
) known fluxes and corresponding simulated MID data.
It is used in verifying the performance of algorithm in multiple simulated data.
-f, --new_flux
:
If this optional parameter appear, new known flux optimized from PHDGH mass spectrometry data will be generated. Otherwise, the stored flux vector will be loaded.
-n, --with_noise
:
If this optional parameter appear, all-available and experimentally-available MID data will be generated with randomized noise. Otherwise, the precise MID data will be generated.
-i, --index p
:
This parameter will add an extra number suffix p
to generated simulated data file. It is usually used to distinguish
the newly generated simulated data.
Usage: python main.py computation {sensitivity, experiments} running_mode job_name
Positional arguments
running_mode
: Running mode of the script.
flux_analysis
: Option to start a new flux analysis process to the target job.result_process
: Option to process analysis results of the target job.solver_output
: Option to output detailed model, data and configurations of the target job.raw_experimental_data_plotting
: Only available inexperiments
mode. Option to display the raw experimental data of target job.
job_name
: Name of target job. List of available jobs are listed below.
Optional arguments
-p, --parallel_num
:
Number of parallel processes. If not provided, it will be selected according to CPU cores.
-t, --test_mode
:
Whether the code is executed in test mode, which means less sample number and shorter time (several minutes).
This option will execute series of operations related to algorithm development, performance assessment and robustness
analysis based on simulated MID data. Their raw data will be output to common_data/raw_data/model_data_sensitivity
. If
not specified, all jobs in this analysis rely on the basic model (base_model
).
List of jobs
Job name in this script | Simulated data size | MID coverage | Initial solutions | Description |
---|---|---|---|---|
raw_model_all_data |
Single | All-available MID | Randomly sampled | Basic optimization based on simulated all-available MID data generated from one known flux vector. |
raw_model_raw_data |
Single | Experimentally-available MID | Randomly sampled | Basic optimization based on simulated experimentally-available MID data generated from one known flux vector. |
optimization_from_all_data_average_solutions |
Single | All-available MID | Averaged solutions of raw_model_all_data |
Optimization starting from averaged solutions of raw_model_all_data based on simulated all-available MID data. |
optimization_from_raw_data_average_solutions |
Single | Experimentally-available MID | Averaged solutions of raw_model_raw_data |
Optimization starting from averaged solutions of raw_model_raw_data based on simulated experimentally-available MID data. |
optimization_from_batched_simulated_all_data |
30 | All-available MID | Randomly sampled | Optimization based on multiple simulated all-available MID data generated from 30 distantly distributed known flux vectors. |
optimization_from_batched_simulated_raw_data |
30 | Experimentally-available MID | Randomly sampled | Optimization based on multiple simulated experimentally-available MID data generated from 30 distantly distributed known flux vectors. |
optimization_from_batched_simulated_all_data_average_solutions |
30 | All-available MID | Averaged solutions of optimization_from_batched_simulated_all_data |
Optimization starting from averaged solutions of optimization_from_batched_simulated_all_data based on simulated all-available MID data. |
optimization_from_batched_simulated_raw_data_average_solutions |
30 | Experimentally-available MID | Averaged solutions of optimization_from_batched_simulated_raw_data |
Optimization starting from averaged solutions of optimization_from_batched_simulated_raw_data based on simulated experimentally-available MID data. |
data_sensitivity |
Single | Varied in each set | Randomly sampled | Optimization from datasets with different data availability. |
This option will execute series of operations related to analysis to experimental data, including HCT116, renal
carcinoma, lung tumor and colon cancer cell lines. Their raw data will be output
to common_data/raw_data/experimental_data_analysis
. All tracing experiments rely on U-13C-glucose.
List of jobs
Job name in this script | Model | Data source | Tissue type | Total sample size (combine biological replicates) |
Analysis | Optimization number of each sample | Description |
---|---|---|---|---|---|---|---|
hct116_cultured_cell_line |
Basic model (base_model ) |
Published data of HCT-116 labeling experiments Reid et al, 2018 | Cultured colon cancer cell line | 1 | Traditional MFA method | 400 | Reanalyze the HCT-116 cell line data for verification of our pipeline. |
renal_carcinoma_invivo_infusion |
Basic model with GLC and CIT buffers (base_model_with_glc_tca_buffer ) |
Published data of infusion experiments for patients with renal carcinomaCourtney et al, 2018 | Renal carcinoma and brain tumor in patients | 15 | Optimization-averaging algorithm | 100,000 | Analyze the in vivo infusion data through the optimization-averaging algorithm. |
renal_carcinoma_invivo_infusion_traditional_method |
Basic model with GLC and CIT buffers (base_model_with_glc_tca_buffer ) |
Published data of infusion experiments for patients with renal carcinomaCourtney et al, 2018 | Renal carcinoma and brain tumor in patients | 15 | Traditional MFA method | 400 | Analyze the same data with the traditional strategy for comparison. |
lung_tumor_invivo_infusion |
Basic model with GLC and CIT buffers (base_model_with_glc_tca_buffer ) |
Published data of infusion experiments for patients with lung cancer Faubert et al, 2017 | Lung tumor in patients | 35 | Optimization-averaging algorithm | 60,000 | Analyze the in vivo infusion data of multiple kinds of cancer through the optimization-averaging algorithm. |
colon_cancer_cell_line |
Basic model (base_model ) |
New data of eight colon cancer cell lines | Cultured colon cancer cell line | 16 | Optimization-averaging algorithm | 100,000 | Analyze the cultured cell data through the optimization-averaging algorithm to verify our finding. |
colon_cancer_cell_line_traditional_method |
Basic model (base_model ) |
New data of eight colon cancer cell lines | Cultured colon cancer cell line | 16 | Traditional MFA method | 400 | Analyze the same data with the traditional strategy for comparison. |
There are 12 different options under the figure
manu, of which 5 are main figures, and 6 are supplementary figures. The all
option can regenerate all figures. For example:
python main.py figure 1
Arguments | Figures | Main figure or supplementary figure | Output files |
---|---|---|---|
1 |
Figure 1 | Main figure | short_figure_1.pdf |
s1 |
Supplementary Figure 1 | Supplementary figure | short_figure_s1.pdf |
s2 |
Supplementary Figure 2 | Supplementary figure | short_figure_s2.pdf |
s3 |
Supplementary Figure 3 | Supplementary figure | short_figure_s3.pdf |
s4 |
Supplementary Figure 4 | Supplementary figure | short_figure_s4.pdf |
s5 |
Supplementary Figure 5 | Supplementary figure | short_figure_s5.pdf |
Solver description: solver_descriptions.xlsx
Flux raw data: flux_raw_data.xlsx
MID raw data: mid_raw_data.xlsx
Fluxes and simulated MID: simulated_flux_vector_and_mid_data.xlsx
Solver description: solver_descriptions.xlsx
Flux raw data: flux_raw_data.xlsx
MID raw data: mid_raw_data.xlsx
Solver description: solver_descriptions.xlsx
Flux raw data: flux_raw_data.xlsx
MID raw data: mid_raw_data.xlsx
MFA for renal carcinoma (Figure S4)
Solver description: solver_descriptions.xlsx
Flux raw data: flux_raw_data.xlsx
MID raw data: mid_raw_data.xlsx
Flux raw data: flux_raw_data.xlsx
MID raw data: mid_raw_data.xlsx
Solver description: solver_descriptions.xlsx
Flux raw data: flux_raw_data.xlsx
MID raw data: mid_raw_data.xlsx
Flux raw data: flux_raw_data.xlsx
MID raw data: mid_raw_data.xlsx
Shiyu Liu
This software is released under the MIT License.