Skip to content

A high-performance package for automated metabolic flux analysis

License

Notifications You must be signed in to change notification settings

LocasaleLab/Automated-MFA-2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Flux Analysis

Synopsis

Developed for 13C metabolic flux analysis (MFA), this tool processes mass isotopomer distribution (MID) data from mass spectrometry to accurately fit MID data and deduce metabolic activities. It includes functionality for MFA, sensitivity analysis, experimental data analysis, and visualization of results for publication.

Requirements

The tool is built for Python 3.8 and requires the following packages:

Packages Version has been tested
numpy 1.22
scipy 1.7
matplotlib 3.6
tqdm 4.64
pandas 1.5.2
sklearn 3.0
xlsxwriter 3.0
numba (optional) 0.56

Anaconda is recommended as it includes most of the necessary packages. The packages provided by Anaconda are also optimized for Intel CPUs, ensuring better performance.

Model

Models utilized in this software are in scripts/model folder.

The basic model ( base_model) contains the base model utilized in algorithm development, analysis to data availability and experimental data analysis for cultured cells.

The basic model with GLC and CIT buffers ( base_model_with_glc_tca_buffer ) contains the model for analysis of in vivo infusion data from patients, which is slightly different from base model in several reactions.

Data

All 13C-isotope labeling data are in scripts/data folder.

Infusion data from patients with renal, brain and lung cancer are from Faubert et al, 2017 (renal_carcinoma/data.xlsx) and Courtney et al, 2018 (lung_tumor/data.xlsx).

Labeling data from cultured cell line HCT-116 are from Reid et al, 2018 (hct116_cultured_cell_line/13C-Glucose_tracing_Mike.xlsx).

Data for the eight colon cancer cell lines are generated in this study ( colon_cancer_cell_line/data.xlsx).

These raw data are loaded and converted to standard form for MFA.

For MFA results, please refer to the Results section.

Algorithm and Solver

Algorithm and solver utilized in this study are located in the scripts/src/core folder.

The model and data folder include some class definition and corresponding processing functions. Specifically, EMU algorithm is encoded in model/emu_analyzer_functions.py.

Most optimizations are based on slsqp_solver and slsqp_numba_solver. As their names indicate, the slsqp_numba_solver is implemented based on numba package for faster execution (roughly 50% time reduction). However, the numba version has the memory leak problem in parallelized executions in Linux system. If running for long time (longer than 50 hours), the normal version is recommended.

Getting started

This script could also be executed as a raw Python project. Make sure Python 3.8 and all required packages are correctly installed. First switch to a target directory and download the source code:

git clone https://github.com/LocasaleLab/Automated-MFA-2023

Switch to the source direct, add PYTHONPATH environment and run the main.py:

cd Automated-MFA-2023
export PYTHONPATH=$PYTHONPATH:`pwd`
python main.py

You could try multiple different arguments according to help information. For example:

python main.py computation experiments flux_analysis hct116_cultured_cell_line -t

This instruction means running a computation, which is a flux_analysis process of data in experiments named hct116_cultured_cell_line in test mode (-t). This process typically completes in 30 minutes.

Detailed argument list will be explained below.

Arguments

There are two different options under the main manu:

  • figure: Option to generate figures in paper. This option must be executed after that analysis results are generated by the computation option.
  • computation: Option to run most computations.

Computations

There are four different options under the computation manu:

  • standard_name: Option to output standard name of metabolites and reactions
  • simulation: Option to generate simulated MID data
  • sensitivity: Option to analyze protocol, model, data and config sensitivity of MFA
  • experiments: Option to run MFA for several experimental data analyses

Standard name

This option will output the standard name of all metabolites and reactions to common_data/raw_data/standard_name.xlsx.

Simulation

This option will generate simulated MID data utilized in algorithm development and robustness analysis. Simulated data Excel and pickle file will be output to the folder common_data/raw_data/simulated_data and scripts/data/simulated_data respectively. This option has following optional parameters:

-b, --batch_num n:

This parameter is used to generated batched (determined by n) known fluxes and corresponding simulated MID data. It is used in verifying the performance of algorithm in multiple simulated data.

-f, --new_flux:

If this optional parameter appear, new known flux optimized from PHDGH mass spectrometry data will be generated. Otherwise, the stored flux vector will be loaded.

-n, --with_noise:

If this optional parameter appear, all-available and experimentally-available MID data will be generated with randomized noise. Otherwise, the precise MID data will be generated.

-i, --index p:

This parameter will add an extra number suffix p to generated simulated data file. It is usually used to distinguish the newly generated simulated data.

Common Arguments of Sensitivity and Experiments

Usage: python main.py computation {sensitivity, experiments} running_mode job_name

Positional arguments

running_mode: Running mode of the script.

  • flux_analysis: Option to start a new flux analysis process to the target job.
  • result_process: Option to process analysis results of the target job.
  • solver_output: Option to output detailed model, data and configurations of the target job.
  • raw_experimental_data_plotting: Only available in experiments mode. Option to display the raw experimental data of target job.

job_name: Name of target job. List of available jobs are listed below.

Optional arguments

-p, --parallel_num:

Number of parallel processes. If not provided, it will be selected according to CPU cores.

-t, --test_mode:

Whether the code is executed in test mode, which means less sample number and shorter time (several minutes).

Sensitivity

This option will execute series of operations related to algorithm development, performance assessment and robustness analysis based on simulated MID data. Their raw data will be output to common_data/raw_data/model_data_sensitivity. If not specified, all jobs in this analysis rely on the basic model (base_model).

List of jobs

Job name in this script Simulated data size MID coverage Initial solutions Description
raw_model_all_data Single All-available MID Randomly sampled Basic optimization based on simulated all-available MID data generated from one known flux vector.
raw_model_raw_data Single Experimentally-available MID Randomly sampled Basic optimization based on simulated experimentally-available MID data generated from one known flux vector.
optimization_from_all_data_average_solutions Single All-available MID Averaged solutions of raw_model_all_data Optimization starting from averaged solutions of raw_model_all_data based on simulated all-available MID data.
optimization_from_raw_data_average_solutions Single Experimentally-available MID Averaged solutions of raw_model_raw_data Optimization starting from averaged solutions of raw_model_raw_data based on simulated experimentally-available MID data.
optimization_from_batched_simulated_all_data 30 All-available MID Randomly sampled Optimization based on multiple simulated all-available MID data generated from 30 distantly distributed known flux vectors.
optimization_from_batched_simulated_raw_data 30 Experimentally-available MID Randomly sampled Optimization based on multiple simulated experimentally-available MID data generated from 30 distantly distributed known flux vectors.
optimization_from_batched_simulated_all_data_average_solutions 30 All-available MID Averaged solutions of optimization_from_batched_simulated_all_data Optimization starting from averaged solutions of optimization_from_batched_simulated_all_data based on simulated all-available MID data.
optimization_from_batched_simulated_raw_data_average_solutions 30 Experimentally-available MID Averaged solutions of optimization_from_batched_simulated_raw_data Optimization starting from averaged solutions of optimization_from_batched_simulated_raw_data based on simulated experimentally-available MID data.
data_sensitivity Single Varied in each set Randomly sampled Optimization from datasets with different data availability.

Experiments

This option will execute series of operations related to analysis to experimental data, including HCT116, renal carcinoma, lung tumor and colon cancer cell lines. Their raw data will be output to common_data/raw_data/experimental_data_analysis. All tracing experiments rely on U-13C-glucose.

List of jobs

Job name in this script Model Data source Tissue type Total sample size
(combine biological replicates)
Analysis Optimization number of each sample Description
hct116_cultured_cell_line Basic model (base_model) Published data of HCT-116 labeling experiments Reid et al, 2018 Cultured colon cancer cell line 1 Traditional MFA method 400 Reanalyze the HCT-116 cell line data for verification of our pipeline.
renal_carcinoma_invivo_infusion Basic model with GLC and CIT buffers (base_model_with_glc_tca_buffer) Published data of infusion experiments for patients with renal carcinomaCourtney et al, 2018 Renal carcinoma and brain tumor in patients 15 Optimization-averaging algorithm 100,000 Analyze the in vivo infusion data through the optimization-averaging algorithm.
renal_carcinoma_invivo_infusion_traditional_method Basic model with GLC and CIT buffers (base_model_with_glc_tca_buffer) Published data of infusion experiments for patients with renal carcinomaCourtney et al, 2018 Renal carcinoma and brain tumor in patients 15 Traditional MFA method 400 Analyze the same data with the traditional strategy for comparison.
lung_tumor_invivo_infusion Basic model with GLC and CIT buffers (base_model_with_glc_tca_buffer) Published data of infusion experiments for patients with lung cancer Faubert et al, 2017 Lung tumor in patients 35 Optimization-averaging algorithm 60,000 Analyze the in vivo infusion data of multiple kinds of cancer through the optimization-averaging algorithm.
colon_cancer_cell_line Basic model (base_model) New data of eight colon cancer cell lines Cultured colon cancer cell line 16 Optimization-averaging algorithm 100,000 Analyze the cultured cell data through the optimization-averaging algorithm to verify our finding.
colon_cancer_cell_line_traditional_method Basic model (base_model) New data of eight colon cancer cell lines Cultured colon cancer cell line 16 Traditional MFA method 400 Analyze the same data with the traditional strategy for comparison.

Figures

There are 12 different options under the figure manu, of which 5 are main figures, and 6 are supplementary figures. The all option can regenerate all figures. For example:

python main.py figure 1
Arguments Figures Main figure or supplementary figure Output files
1 Figure 1 Main figure short_figure_1.pdf
s1 Supplementary Figure 1 Supplementary figure short_figure_s1.pdf
s2 Supplementary Figure 2 Supplementary figure short_figure_s2.pdf
s3 Supplementary Figure 3 Supplementary figure short_figure_s3.pdf
s4 Supplementary Figure 4 Supplementary figure short_figure_s4.pdf
s5 Supplementary Figure 5 Supplementary figure short_figure_s5.pdf

Results

MFA for HCT116 cultured cancer cell line (Figure 1, Figure S1)

Solver description: solver_descriptions.xlsx

Flux raw data: flux_raw_data.xlsx

MID raw data: mid_raw_data.xlsx

Simulated data

Fluxes and simulated MID: simulated_flux_vector_and_mid_data.xlsx

Development and benchmark of the algorithm (Figure 1, Figure S2, Figure S3)

All-available MID data:

Solver description: solver_descriptions.xlsx

Flux raw data: flux_raw_data.xlsx

MID raw data: mid_raw_data.xlsx

Experimentally-available MID data:

Solver description: solver_descriptions.xlsx

Flux raw data: flux_raw_data.xlsx

MID raw data: mid_raw_data.xlsx

MFA for renal carcinoma (Figure S4)

Results from optimization-averaging algorithm:

Solver description: solver_descriptions.xlsx

Flux raw data: flux_raw_data.xlsx

MID raw data: mid_raw_data.xlsx

Benchmark results

Flux raw data: flux_raw_data.xlsx

MID raw data: mid_raw_data.xlsx

MFA for 8 cultured colon cancer cell lines (Figure 1, Figure S5)

Results from optimization-averaging algorithm:

Solver description: solver_descriptions.xlsx

Flux raw data: flux_raw_data.xlsx

MID raw data: mid_raw_data.xlsx

Benchmark results

Flux raw data: flux_raw_data.xlsx

MID raw data: mid_raw_data.xlsx

Contributors

Shiyu Liu

License

This software is released under the MIT License.

About

A high-performance package for automated metabolic flux analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages