Synthetic Biology Journal ›› 2023, Vol. 4 ›› Issue (3): 535-550.DOI: 10.12211/2096-8280.2022-066

• Invited Review • Previous Articles     Next Articles

Data-driven prediction and design for enzymatic reactions

Tao ZENG, Ruibo WU   

  1. School of Pharmaceutical Science,Sun Yat-Sen University,Guangzhou 510006,Guangdong,China
  • Received:2022-11-23 Revised:2022-12-27 Online:2023-07-05 Published:2023-06-30
  • Contact: Ruibo WU

数据驱动的酶反应预测与设计

曾涛, 巫瑞波   

  1. 中山大学药学院,广东 广州 510006
  • 通讯作者: 巫瑞波
  • 作者简介:曾涛(1995—),男,博士研究生。研究方向为计算驱动的生物合成路线设计与优化。 E-mail:zengt28@mail2.sysu.edu.cn
    巫瑞波(1984—),男,教授,博士生导师。研究方向为基于多尺度模拟的萜类天然产物生物智造与药效挖掘。 E-mail:wurb3@mail.sysu.edu.cn
  • 基金资助:
    广东省重点研发计划(2022B1111080005)

Abstract:

Enzymes are efficient catalysts with substrate specificity and stereo- and regioselectivity, which are widely used in producing chemicals, drugs and materials. Enzymes are cores for biocatalysis, and thus prediction on their functions and design of enzymatic reactions are driving forces for intelligent biomanufacturing through biocatalysis. So far limited understanding on enzymatic catalysis hinders the exploration of enzymatic reactions for industrial applications. For example, it is difficult to predict enzymatic activities on unreported substrates, to elucidate synthetic routes for newly found structures of enzymes, and to redesign enzymes for specific scenarios. In the era of big data, data-driven approaches have exhibited powerful capabilities for exploring enzymatic reactions, by filling gap between the large corpora of enzymatic data and limited understanding on functions of the enzymes. Recently, computational tools and platforms have greatly accelerated experimental research, and improved the design-build-test-learn cycle. Herein we review progress in computational tools for enzymatic reaction prediction and design, focusing on the application of deep learning methods in this field. Referring to key elements (substrate, product and enzyme) for enzymatic reactions, related databases are summarized. Then, the data-driven approaches for forward and backward prediction of enzymatic reaction routes and functions of enzymes, their design and theoretical calculation for enzymatic catalysis are addressed. Finally, the status and prospective of data-driven approaches for enzymatic catalysis prediction and design, including the data, model, algorithm and platform, are discussed.

Key words: big data, machine learning, enzymatic catalysis, enzyme design, biosynthesis

摘要:

酶催化已经在日用化学品、药物和功能材料等生产中得到越来越广泛的应用。酶,作为生物制造业的核心“芯片”,其催化反应的预测与设计是推动传统生物制造走向生物智造发展的核心驱动力之一。然而目前我们对大自然酶催化的了解仍然非常有限,这严重阻碍了我们对酶催化空间的探索和利用。随着大数据时代的到来,数据驱动的计算模拟已经成为酶催化新空间的挖掘及其功能优化设计的重要手段。各种计算工具和平台的开发正极大地加速并赋能于酶学相关领域的各类实验研究。本文针对酶催化过程中底物、产物和酶的预测及设计方法进行了综述,概述了近年来酶反应相关的数据库,汇总比较了数据驱动的酶反应设计工具,着重介绍了深度学习在该领域的应用,并从数据、模型、算法、平台等多方面展望和探讨了数据驱动型计算方法在酶反应预测与设计领域的发展前景。

关键词: 大数据, 机器学习, 酶催化, 酶设计, 生物合成

CLC Number: