Synthetic Biology Journal

   

Machine Learning Applications in Genome-scale Metabolic Model Reconstruction and Curation

Ke WU1,2, Jiahao LUO1,2, Feiran LI1,2   

  1. 1.Institute of Biopharmaceutical and Health Engineering,Tsinghua Shenzhen International Graduate School,Tsinghua University,Shenzhen 518055,China
    2.Key Laboratory for Industrial Biocatalysis,Ministry of Education,Institute of Biochemical Engineering,Department of Chemical Engineering,Tsinghua University,Beijing 100084,China
  • Received:2024-12-02 Revised:2025-02-12 Published:2025-02-14
  • Contact: Feiran LI

机器学习驱动的基因组规模代谢模型构建与优化

吴柯1,2, 罗家豪1,2, 李斐然1,2   

  1. 1.清华大学深圳国际研究生院,生物医药与健康工程研究院,广东,深圳,518055
    2.清华大学化学工程系,工业生物催化教育部重点实验室,北京 100084
  • 通讯作者: 李斐然
  • 作者简介:吴柯(2000—),男,博士研究生。研究方向为机器学习辅助基因组规模代谢模型模型开发。 E-mail:wk37@tju.edu.cn
    李斐然(1993—),女,助理教授,博士生导师,研究方向为基因组规模代谢模型开发、微生物细胞工厂设计、酶参数预测,致力于构建数字生命模型。 E-mail:feiranli@sz.tsinghua.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(22478223)

Abstract:

Since the publication of first genome-scale metabolic model (GEM) in 1999, GEM has become an essential tool for analyzing biological metabolism. The model integrates metabolic genes, metabolites, and reactions, and combines stoichiometric matrices with constraint-based optimization to systematically describe and simulate metabolic processes in organisms. The development of automated pipelines for GEM reconstruction has expanded its applicability to organisms from all kingdoms of life. Additionally, GEM can integrate kinetic parameters, thermodynamic parameters, multi-omics data and multi-cellular processes to reconstruct more accurate models, thereby improving prediction accuracy. However, the reconstruction of GEM remains heavily dependent on pre-existing knowledge, inherently limiting its scope to currently available information. This dependency restricts our ability to fully unravel the complexity and dynamic nature of metabolism.Recent advances in machine learning have demonstrated extraordinary capabilities in biological tasks like protein structure prediction, disease identification and GEM reconstruction related tasks such as functional annotation and large-scale data integration, showcasing its power in identifying patterns and uncovering hidden relationships within biological systems. Machine learning provides a promising pathway to overcome the limitations of GEM by expanding its applicability to areas previously constrained by data availability and complexity. This review summarizes the traditional reconstruction methods of GEM and their applications in integrating multi-dimensional data to build multi-constraint and multi-process model. The review also focuses on key applications of machine learning in gene function annotation, pathway analysis, gap-filling prediction in the reconstruction of GEM. Additionally, the potential of machine learning in predicting kinetic, thermodynamic, and other key biochemical parameters in the reconstruction of multi-constraint and multi-process model is discussed.By combining GEM with machine learning innovations, researchers can improve model accuracy, enhance scalability, and gain new insights into previously elusive metabolic mechanisms, bridging gaps in metabolic knowledge, and underscoring its importance as a cornerstone for future developments in systems biology and biotechnology.

Key words: genome-scale metabolic model, machine learning, synthetic biology, metabolic modeling, multi-constraint and multi-process model

摘要:

自1999年首个基因组规模代谢模型(Genome-scale metabolic model,GEM)问世以来,GEM已成为解析生物代谢的重要工具。该模型包含代谢基因、代谢物和反应,并结合化学计量矩阵与约束优化,系统地描述和模拟生物体内的代谢过程。此外,GEM能够整合热力学参数、动力学参数、多组学数据及多细胞过程,从而构建更精细且具有更强大预测能力的多约束多过程模型。然而,先验知识的局限成为其发展的瓶颈。机器学习技术凭借强大的数据处理和模式识别能力,为进一步扩展GEM提供了新思路。本综述系统总结了传统GEM及多约束多过程模型的构建流程,并着重探讨了机器学习在其中关键步骤中的应用前景,如基因功能注释、途径解析、空缺填补和生物学参数预测。机器学习技术作为新的驱动力,有望大幅度提升GEM的规模和质量,深化对生物代谢机制的理解,并推动实现数字孪生细胞。

关键词: 基因组规模代谢模型, 机器学习, 合成生物学, 代谢建模, 多约束多过程模型

CLC Number: