Model-based Optimization of Cell-free Enzymatic Cascades Under Uncertainty Exemplified for the Production of GDP-Fucose - Data Repository
This repository contains all files necessary to reproduce the findings of the publication Model-based Optimization of Cell-free Enzymatic Cascades Under Uncertainty Exemplified for the Production of GDP-Fucose and is organized according to the structure of the presented results: (1) model building including the creation of an ensemble of kinetic model parameters via repeated global parameter estimation and model-based optimization in order to (2) maximize the product titer (objective O1), (3) minimize the enzyme load (objective O3), and (4) minimize the normalized process costs (objective O6).
All relevant files for each result (1 - 4) are stored in separate directories with the following contents: COPASI files (.cps) which contain the model description and also define the specific task to be calculated (such as parameter estimations or other optimizations), experimental data files (.txt) of time course measurements, Python scripts (.py) for the calculation and visualization of the specified tasks, comma-separated value (.csv) or Python-specific pickle (.pkl) files for storing calculation results, and files of the resulting figures stored as PDF, PNG, and as vector graphics (.svg).
The script files have been written and tested using Python 3.8.3 as well as the following additional packages: numpy 1.24.2, pandas 1.5.3, copasi-basico 0.47, tqdm 4.47.0, matplotlib 3.7.0, and seaborn 0.12.2. Installing COPASI is not necessary to reproduce the results of this work since its Python API basiCO is handling all calculations. However, COPASI can be used as a convenient tool for inspecting the model files (.cps).
COPASI model files (.cps) can be regarded as extended SBML files since they contain all information of a standard SBML file (basic model properties) while also including additional information on specified tasks, plots, and report files. COPASI files can be opened with a text editor to reveal the underlying SBML-like tree structure and every .cps file can be saved as an SBML file which will then contain all defining properties of the model but not any task or plotting information. More information on the compatibility between the COPASI and SBML file formats can be found here.
All python calculation scripts make use of basiCO, a simplified COPASI Python API, which requires each task to be set in a model file. Therefore, separate COPASI model files were created for each task. However, they only differ in the specific task that they define. The basic model with respect to all of its defining properties (species, reactions, rate laws, and events) is the same in all COPASI files. Furthermore, any initial concentrations of substrates and enzymes which are set in the model files are overwritten by the respective calculation scripts which redefine them according to the specifications of the given task. Each Python script is only reliant on the packages which are imported at the beginning of the file (no additional custom functions) and all scripts are designed to be run from top to bottom. The order of execution for the different script files in each directory as well as the interdependency between them is denoted via their alphabetical identifiers detailed in the sections below.
In order to reproduce the output files navigate to the directories described below and use the Python interpreter to run the script files in the designated order.
The global parameter estimation is set in the COPASI model file and includes all experimental data files in the directory (experiments 1-5). The calculation script a repeats the parameter estimation 100 times. The resulting 100 parameter sets are stored in a .csv file which is read by the visualization script a to generate the histogram multi plot figure. Calculation script b also reads the parameter estimation output and uses the model file set to baseline initial concentrations to create time course simulations for each parameter set. The time courses are then visualized together with data of the baseline experiment via visualization script b.
File Type | File Name |
---|---|
COPASI model file: | GDP-Fucose_v7XGSK_with_PE9XGSK_setup.cps |
Experimental data file 1: | 2022_11_08_FE13_for_model_fiting.txt |
Experimental data file 2: | 2022_11_17_FE17_for_model_fiting_1FKP.txt |
Experimental data file 3: | 2022_11_17_FE17_for_model_fiting_07FKP.txt |
Experimental data file 4: | 2022_11_24_FE18_for_model_fiting_1FKP.txt |
Experimental data file 5: | 2022_11_24_FE18_for_model_fiting_07FKP.txt |
Calculation script a): | rand_param_sampling.py |
Calculation script b): | calc_time_courses_InitFE18_07FKP.py |
Calculation output a): | sampling_output_EvoStrat_100runs.csv |
Calculation output b): | GDP-Fucose_v7XGSK_PE9XGSK_EvoStrat100x_TCSimResults_InitFE18_07FKP.pkl |
Visualization script a): | visualize_params.py |
Visualization script b): | visualize_time_courses_InitFE18_07FKP.py |
Visualization output a): | GDP-Fucose_v7XGSK_PE9XGSK_EvoStrat100x_Params_fig |
Visualization output b): | GDP-Fucose_v7XGSK_PE9XGSK_EvoStrat100x_TCs_InitFE18_07FKP_fig |
The optimization is set in the COPASI model file. The calculation script a sets the model to the parameter sets of the ensemble and repeats the optimization for each set (100 optimization runs in total). The result is stored in a pickle file which is read by the visualization script a to create the box plot / scatter plot figure. Calculation script b reads the model file, the optimization output, and the parameter ensemble in order to perform the cross-validation and subsequent scoring. The results of the cross-validation and the scoring are stored in pickle files (b1 and b2) and are read by visualization script b to produce the heat map figure. Calculation script c compares the validation experiment to simulations of the model ensemble with its initial concentrations set to the best scoring optimization result (O53) and the simulation result is stored in a pickle file which is read by visualization script c to create the validation time courses figure.
The optimization is set in the COPASI model file. The calculation script a sets the model to the parameter sets of the ensemble and repeats the optimization for each set (100 optimization runs in total). The result is stored in a pickle file which is read by the visualization script a to create the box plot / scatter plot figure. Calculation script b reads the model file, the optimization output, and the parameter ensemble in order to perform the cross-validation and subsequent scoring. The results of the cross-validation and the scoring are stored in pickle files (b1, b2, and b3) and are read by visualization script b to produce the heat map figure. Calculation script c compares the validation experiment to simulations of the model ensemble with its initial concentrations set to the best scoring optimization result (O88) and the simulation result is stored in a pickle file which is read by visualization script c to create the validation time courses figure.
The optimization is set in the COPASI model file. The calculation script a sets the model to the parameter sets of the ensemble and repeats the optimization for each set (100 optimization runs in total). The result is stored in a pickle file which is read by the visualization script a to create the box plot / scatter plot figure. Calculation script b reads the model file, the optimization output, and the parameter ensemble in order to perform the cross-validation and subsequent scoring. The results of the cross-validation and the scoring are stored in pickle files (b1 and b2) and are read by visualization script b to produce the heat map figure.
File Type | File Name |
---|---|
COPASI model file: | GDP-Fucose_v7XGSK_PE9XGSK_with_Opt9d_setup.cps |
Calculation script a): | Opt9d_calc.py |
Calculation script b): | select_best_Opt9d_for_all_param_sets_calc.py |
Calculation output a): | GDP-Fucose_v7XGSK_PE9XGSK_Opt9d_RandParamSampl_EvoStrat100x_list_res_stats.pkl |
Calculation output b1): | GDP-Fucose_v7XGSK_PE9XGSK_Opt9d_SlctBestOptAllParams_EvoStrat_100x_allCPerP.pkl |
Calculation output b2): | GDP-Fucose_v7XGSK_PE9XGSK_Opt9d_SlctBestOptAllParams_EvoStrat100x_score_df.pkl |
Visualization script a): | Opt9d_vis.py |
Visualization script b): | select_best_Opt9d_for_all_param_sets_vis.py |
Visualization output a): | GDP-Fucose_v7XGSK_PE9XGSK_EvoStrat100x_Opt9d_BoxScatterPlot_fig |
Visualization output b): | GDP-Fucose_v7XGSK_PE9XGSK_EvoStrat100x_Opt9d_CostPerProdHeatmap_fig |