Flux Balance Analyses with COBRApy
Understanding biology by analyzing high-dimensional -OMICS data sets is a general trend these days. This high-dimensional data could be a result from an experimental run on a Next-Generation Sequencing (NGS) platform or a mass-spectrometer platform or a similar high-throughput technique. In case of NGS, generally the measured molecules are DNA/RNA. In case of mass-spec, the measured ionized molecules could be proteins, metabolites or lipids.
This trend of gather-analyze-report data in biological space is increasing at an exponential space given the lowering cost of sequencing genomes using NGS technologies.
Systematically analyzing such datasets is critical to draw meaningful observations. Moreover, a technique to integrate the diverse omics datasets (e.g., transcriptomics and proteomics data for the same biological sample) is also a new trend that is increasing. The advantage with such integration is to infer insights using the information from transcript and protein profiles.
Metabolic network models provide an interesting way to overlay this diverse information i.e., data sets emerging from different technologies a.k.a NGS based transcripts, mass-spec baed protein abundances, and 13C mass-spec based metabolic fluxes.
Before going into integrating these datasets (possibly in future posts), it is essential to understand the metabolic network models from informatics point of view. These days, the metabolic network models are built computationally using the genome sequence and homology based methods and further refined using other resources, e.g., manual curation using previous publications. The models built are then stored in json, xml, SBML formats for further computational analyses. More information on such an effort in E. coli and other organisms can be found here.
Here I focus on analyzing one such E. coli metabolic network iAF1260 model using a mathematical technique called Flux Balance Analysis (FBA). FBA is an approach to computationally analyze the flow of metabolites in a metabolic network (paper on FBA). Such methods of using genome-scale metabolic network model to analyze the flow of metabolites generally are called COBRA based methods.
Previously, in a different project at Wilke lab, we used matlab version of COBRA. Here, I use COBRApy, a python version of COBRA. Most of the information on the COBRApy modules used can be found here
I used some of the modules mentioned above to fit my research needs and specifically here, I show how to 1. read the metabolic network model, 2. generate optimal fluxes, 3. knock-out reactions, 4. change growth media, 5. report data using the jupyter notebook.
Let’s import modules needed. cobra is a module needed to analyze different metabolic network models.
from __future__ import print_function
import cobra
import os
from os.path import join
The model we’ll analyze is iAF1260.
sbml_path = join("Data","iAF1260.xml.gz")
print(sbml_path)
Data\iAF1260.xml.gz
The format of the model is SBML. SBML stands for systems biology markup model. More info here
iAF1260_ecoli_model = cobra.io.read_sbml_model(sbml_path)
The model.optimize() call is the crux of the analyses. Given the objective function, this optimize find a feasible solution i.e., solution here is a vector of in-silico metabolic fluxes.
# Here the objective function is biomass and optimize function calculates the fluxes to get the max biomass (this can be changed)
iAF1260_ecoli_model.optimize()
Optimal solution with objective value 0.737
<div>
fluxes | reduced_costs | |
---|---|---|
12DGR120tipp | 0.000000 | -0.010031 |
12DGR140tipp | 0.000000 | -0.010031 |
12DGR141tipp | 0.000000 | -0.016049 |
12DGR160tipp | 0.000000 | -0.010031 |
12DGR161tipp | 0.000000 | -0.018055 |
... | ... | ... |
ZN2abcpp | 0.000000 | -0.008025 |
ZN2t3pp | 0.000000 | -0.002006 |
ZN2tpp | 0.002327 | 0.000000 |
ZNabcpp | 0.000000 | -0.008025 |
Zn2tex | 0.002327 | 0.000000 |
2382 rows × 2 columns
</div>
The function model.summary() prints out the exchange fluxes, along with the optimized solution. In this case, objective function we’re trying to maximize is the model growth (i.e., Biomass) and looks like the value is 0.737.
iAF1260_ecoli_model.summary()
IN FLUXES OUT FLUXES OBJECTIVES
------------------- ------------ ----------------------
o2_e 16.3 h2o_e 37.2 BIOMASS_Ec_i... 0.737
glc__D_e 8 co2_e 17.8
nh4_e 7.94 h_e 6.77
pi_e 0.708
so4_e 0.184
k_e 0.131
mg2_e 0.00582
fe2_e 0.00556
fe3_e 0.00523
ca2_e 0.00349
cl_e 0.00349
cobalt2_e 0.00233
cu2_e 0.00233
mn2_e 0.00233
mobd_e 0.00233
zn2_e 0.00233
We can also knowck-out reactions and re-run the optimize function and calculate the new fluxes.
# remove nitrogen source and look at Biomass objective function
# here the only nitrogen source seems to be nh4_e.
iAF1260_ecoli_model.reactions.NH4tex.knock_out()
iAF1260_ecoli_model.optimize()
Optimal solution with objective value 0.000
<div>
fluxes | reduced_costs | |
---|---|---|
12DGR120tipp | 0.000000e+00 | 3.330669e-16 |
12DGR140tipp | 0.000000e+00 | 3.330669e-16 |
12DGR141tipp | 0.000000e+00 | 6.106227e-16 |
12DGR160tipp | 0.000000e+00 | 3.330669e-16 |
12DGR161tipp | 0.000000e+00 | 6.106227e-16 |
... | ... | ... |
ZN2abcpp | 0.000000e+00 | 2.220446e-16 |
ZN2t3pp | 0.000000e+00 | 5.551115e-17 |
ZN2tpp | 7.906134e-18 | 0.000000e+00 |
ZNabcpp | 0.000000e+00 | 2.220446e-16 |
Zn2tex | 7.906134e-18 | 0.000000e+00 |
2382 rows × 2 columns
</div>
# get the original model
iAF1260_ecoli_model = cobra.io.read_sbml_model(sbml_path)
We can also control the concentration of the media to mimic real-situations. The values adjusted could reflect generally used media such as minimal media etc.
# change glucose source fluxes to different values and see how it affects the objective function
iAF1260_ecoli_model.medium
{'EX_ca2_e': 999999.0,
'EX_cbl1_e': 0.01,
'EX_cl_e': 999999.0,
'EX_co2_e': 999999.0,
'EX_cobalt2_e': 999999.0,
'EX_cu2_e': 999999.0,
'EX_fe2_e': 999999.0,
'EX_fe3_e': 999999.0,
'EX_glc__D_e': 8.0,
'EX_h2o_e': 999999.0,
'EX_h_e': 999999.0,
'EX_k_e': 999999.0,
'EX_mg2_e': 999999.0,
'EX_mn2_e': 999999.0,
'EX_mobd_e': 999999.0,
'EX_na1_e': 999999.0,
'EX_nh4_e': 999999.0,
'EX_o2_e': 18.5,
'EX_pi_e': 999999.0,
'EX_so4_e': 999999.0,
'EX_tungs_e': 999999.0,
'EX_zn2_e': 999999.0}
This is one of the simpler ways to change the concentration. In this specific case, initially the glucose concentration in the media is 8. Here, we read all the media related fluxes to a vector, change the vector and initialize the media with this new vector. Below, we change the concentration from 8 to 20.
# currently EX_glc__D_e is at 8, change it to 20 (Instead of knocking out NH4tex, this is another way of changing the source)
# copy the mediums, change the values and put the medium back in the model (there might be simpler way)
medium = iAF1260_ecoli_model.medium
medium["EX_glc__D_e"] = 20
iAF1260_ecoli_model.medium = medium
iAF1260_ecoli_model.optimize()
Optimal solution with objective value 1.218
<div>
fluxes | reduced_costs | |
---|---|---|
12DGR120tipp | 0.000000 | -0.024236 |
12DGR140tipp | 0.000000 | -0.024236 |
12DGR141tipp | 0.000000 | -0.043625 |
12DGR160tipp | 0.000000 | -0.024236 |
12DGR161tipp | 0.000000 | -0.043625 |
... | ... | ... |
ZN2abcpp | 0.000000 | -0.019389 |
ZN2t3pp | 0.000000 | -0.004847 |
ZN2tpp | 0.003846 | 0.000000 |
ZNabcpp | 0.000000 | -0.019389 |
Zn2tex | 0.003846 | 0.000000 |
2382 rows × 2 columns
</div>
Recalculated fluxes with the new glucose concentration.
iAF1260_ecoli_model.summary()
IN FLUXES OUT FLUXES OBJECTIVES
------------------- ------------------- ---------------------
glc__D_e 20 h_e 53 BIOMASS_Ec_i... 1.22
o2_e 18.5 h2o_e 41.6
nh4_e 13.1 for_e 23.1
pi_e 1.17 ac_e 18.7
so4_e 0.305 co2_e 9.53
k_e 0.216 glyclt_e 0.000815
mg2_e 0.00961
fe2_e 0.0092
fe3_e 0.00865
ca2_e 0.00577
cl_e 0.00577
cobalt2_e 0.00385
cu2_e 0.00385
mn2_e 0.00385
mobd_e 0.00385
zn2_e 0.00385
If we plan to suck out glucose from the E. coli model?
# Repeat the above by changing it to -20 (see the negative sign)
medium["EX_glc__D_e"] = -20
iAF1260_ecoli_model.medium = medium
Optimization seems to result in infeasible solution. More information on how FBA analyses is done, please refer to the original paper by Orth J. D. et. al.
iAF1260_ecoli_model.optimize()
cobra\util\solver.py:416 UserWarning: solver status is 'infeasible'
infeasible solution
Let’s take the glucose concentration back to 8.
# Back to original value of 8
medium["EX_glc__D_e"] = 8
iAF1260_ecoli_model.medium = medium
iAF1260_ecoli_model.optimize()
Optimal solution with objective value 0.737
<div>
fluxes | reduced_costs | |
---|---|---|
12DGR120tipp | 0.000000 | -0.010031 |
12DGR140tipp | 0.000000 | -0.010031 |
12DGR141tipp | 0.000000 | -0.018055 |
12DGR160tipp | 0.000000 | -0.010031 |
12DGR161tipp | 0.000000 | -0.018055 |
... | ... | ... |
ZN2abcpp | 0.000000 | -0.008025 |
ZN2t3pp | 0.000000 | -0.002006 |
ZN2tpp | 0.002327 | 0.000000 |
ZNabcpp | 0.000000 | -0.008025 |
Zn2tex | 0.002327 | 0.000000 |
2382 rows × 2 columns
</div>
To get an idea of energy production:
# Get a sense of energy production and related features
iAF1260_ecoli_model.metabolites.atp_c.summary()
PRODUCING REACTIONS -- ATP C10H12N5O13P3 (atp_c)
------------------------------------------------
% FLUX RXN ID REACTION
--- ------ ---------- --------------------------------------------------
72% 52 ATPS4rpp adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c
18% 13.1 PGK 3pg_c + atp_c <=> 13dpg_c + adp_c
5% 3.34 SUCOAS atp_c + coa_c + succ_c <=> adp_c + pi_c + succoa_c
4% 2.72 PPK atp_c + pi_c <=> adp_c + ppi_c
1% 1.03 PYK adp_c + h_c + pep_c --> atp_c + pyr_c
CONSUMING REACTIONS -- ATP C10H12N5O13P3 (atp_c)
------------------------------------------------
% FLUX RXN ID REACTION
--- ------ ---------- --------------------------------------------------
61% 44.2 BIOMASS... 0.000223 10fthf_c + 0.000223 2ohph_c + 0.5137 a...
12% 8.39 ATPM atp_c + h2o_c --> adp_c + h_c + pi_c
9% 6.19 PFK atp_c + f6p_c --> adp_c + fdp_c + h_c
3% 1.99 NDPK1 atp_c + gdp_c <=> adp_c + gtp_c
2% 1.78 ACCOAC accoa_c + atp_c + hco3_c --> adp_c + h_c + malc...
2% 1.33 GLNS atp_c + glu__L_c + nh4_c --> adp_c + gln__L_c +...
1% 0.788 ASPK asp__L_c + atp_c <=> 4pasp_c + adp_c
1% 0.687 R15BPK atp_c + r15bp_c --> adp_c + prpp_c
1% 0.687 R1PK atp_c + r1p_c --> adp_c + h_c + r15bp_c
Initially we used the default optimization function i.e., biomass maximization. With FBA, you can also change this objective function to mimic your study. For example, you can find new set of fluxes when maximizing ATPM.
# Instead of maximizing biomass, we can change the objective function to maximize ATPM
iAF1260_ecoli_model.objective = "ATPM"
iAF1260_ecoli_model.reactions.get_by_id("ATPM").upper_bound
8.39
iAF1260_ecoli_model.reactions.get_by_id("ATPM").lower_bound
8.39
Most likely, the lower and upper bound of such reactions would be set to experimentally defined value (manual curation). In order to find the max/min, you should set the upper and lower bound values to reasonable values.
# Looks like the ATPM upper bound and lower bound is fixed at 8.39.
# If we run the model now, the optimum calculated will be same.
# INSTEAD, change the upper and lower bounds to different values and see the optimum with objective function of ATPM
iAF1260_ecoli_model.reactions.get_by_id("ATPM").upper_bound = 1000
iAF1260_ecoli_model.reactions.get_by_id("ATPM").lower_bound = -1000
iAF1260_ecoli_model.reactions.get_by_id("ATPM").upper_bound
1000
iAF1260_ecoli_model.reactions.get_by_id("ATPM").lower_bound
-1000
iAF1260_ecoli_model.optimize()
Optimal solution with objective value 95.813
<div>
fluxes | reduced_costs | |
---|---|---|
12DGR120tipp | 0.0 | -2.5 |
12DGR140tipp | 0.0 | -2.5 |
12DGR141tipp | 0.0 | -4.5 |
12DGR160tipp | 0.0 | -2.5 |
12DGR161tipp | 0.0 | -4.5 |
... | ... | ... |
ZN2abcpp | 0.0 | -2.0 |
ZN2t3pp | 0.0 | -0.5 |
ZN2tpp | 0.0 | 0.0 |
ZNabcpp | 0.0 | -2.0 |
Zn2tex | 0.0 | 0.0 |
2382 rows × 2 columns
</div>
The ATPM seems to change significantly. This might not reflect true scenario and might be artifact. However the benefits of using models is to quickly simulate to see if there is anything unusual before designing such experiments.
# Looks like the ATPM value went from 9 all the way to 95 (based on constraints on other reactions)
# Summary of atp reactions with this new objective function is
iAF1260_ecoli_model.metabolites.atp_c.summary()
PRODUCING REACTIONS -- ATP C10H12N5O13P3 (atp_c)
------------------------------------------------
% FLUX RXN ID REACTION
--- ------ -------- --------------------------------------------------
61% 63.8 ATPS4rpp adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c
15% 16 PGK 3pg_c + atp_c <=> 13dpg_c + adp_c
14% 14.8 ACKr ac_c + atp_c <=> actp_c + adp_c
8% 8 PYK adp_c + h_c + pep_c --> atp_c + pyr_c
1% 1.25 SUCOAS atp_c + coa_c + succ_c <=> adp_c + pi_c + succoa_c
CONSUMING REACTIONS -- ATP C10H12N5O13P3 (atp_c)
------------------------------------------------
% FLUX RXN ID REACTION
--- ------ -------- --------------------------------------------------
92% 95.8 ATPM atp_c + h2o_c <=> adp_c + h_c + pi_c
8% 8 PFK atp_c + f6p_c --> adp_c + fdp_c + h_c
Feel free to contact me if you need more information or if you want to set-up such a study! I hope this is helpful.