合成生物学 ›› 2021, Vol. 2 ›› Issue (1): 1-14.DOI: 10.12211/2096-8280.2020-074

• 特约评述 • 上一篇    下一篇

生物分子序列的人工智能设计

王也, 王昊晨, 晏明皓, 胡冠华, 汪小我   

  1. 清华大学自动化系,合成与系统生物学研究中心,教育部生物信息学重点实验室,北京信息科学与技术国家研究中心,北京 100084
  • 收稿日期:2020-07-09 修回日期:2020-11-15 出版日期:2021-02-28 发布日期:2021-03-12
  • 通讯作者: 汪小我
  • 作者简介:王也(1995—),女,博士研究生,主要研究方向为模式识别与机器学习、生物信息学、合成生物学。E-mail:wangy17@mails.tsinghua.edu.cn|汪小我(1980—),男,博士,教授,主要研究方向为模式识别与机器学习、生物信息学、合成生物学。E-mail:xwwang@tsinghua.edu.cn
  • 基金资助:
    国家重点研发计划(2020YFA0906901);国家自然科学基金(61773230)

Design of biomolecular sequences by artificial intelligence

Ye WANG, Haochen WANG, Minghao YAN, Guanhua HU, Xiaowo WANG   

  1. Department of Automation,Tsinghua University,Center for Synthetic and System Biology,Ministry of Education Key Laboratory of Bioinformatics,Beijing National Research Center for Information Science and Technology,Beijing 100084,China
  • Received:2020-07-09 Revised:2020-11-15 Online:2021-02-28 Published:2021-03-12
  • Contact: Xiaowo WANG

摘要:

合成生物学研究本着师法自然、改造自然及超越自然的理念,其核心是通过人工方式将基因元件优化改造和重新组合,以得到满足需要的人工生物系统。获取性能优异的生物元件是构建和控制人工生物系统的基础。近年来,人工生物分子在代谢工程和基因治疗等领域有着广泛应用。如何在广袤的分子序列空间中高效地搜索与设计具有特定生物功能的分子序列,是合成生物学所面临的重要科学问题。伴随着人工智能技术的快速发展,智能算法在复杂生物特征的挖掘与生物分子的设计中表现出巨大潜力。本文从利用深度学习技术发掘的复杂特征规律为指导,智能化地探索新药物分子、核酸序列和蛋白质序列空间的角度出发,重点分析了深度生成式模型在不同人工生物序列设计中的应用特点。在此基础上,结合小分子化合物、核酸和蛋白质等生物分子设计的应用案例,总结分析了针对人工生物分子序列设计的定向寻优策略。为了对智能算法设计的分子进行评估,系统分析了不同领域中不同角度序列设计评估方案的特点,展望了人工生物序列智能设计的发展,需要充分考虑生物系统具有多层次间调控高度耦合的复杂特性,从系统角度对不同层次的生物序列进行优化设计,从而推动人工生物系统的智能适配与优化。

关键词: 合成生物学, 智能设计, 生物元件设计, 深度学习, 智能优化

Abstract:

Based on the concept of learning from nature, transforming and transcending nature, the core of synthetic biology is to optimize, reconstruct and recombine genetic elements in order to build synthetic biological systems that meet our needs. Obtaining desirable biological components is the basis for building and controlling synthetic biological systems. Recently, synthetic biomolecules have been widely used in areas such as metabolic engineering and gene therapy. How to search for biomolecular sequences with specific biological functions from the vast sequence library is a challenge for synthetic biology. With the rapid development of artificial intelligence, intelligent algorithms have shown great potentials in mining complex biological characteristics and designing biomolecules. In this review, the applications of deep generative models for the design of different artificial biological sequences are analyzed from the perspective of exploring new drug molecules, nucleic acid fragment sequences and protein sequence spaces under the guidance of complex feature rules discovered by deep learning technology. Furthermore, combined with the application cases in the design of small molecular compounds, nucleic acids and proteins, the directed optimization strategies for designing artificial biomolecules are summarized and analyzed. In order to evaluate the model-designed molecular sequences, this review systematically analyzes the schemes for sequence design evaluation from different perspectives in applications. As an important information writing carrier of synthetic life systems, how the artificial biological sequence interacts with the complex multi-level regulation in the cell is still an important issue to be studied. In the future, the intelligent design of artificial biological sequence needs to consider the characteristics of biological systems with multi-level regulation that is often coupling. Through the design of biological sequences at different levels, different regulation in natural biological systems should be elucidated at different levels properly for an overall intelligent adaptation and optimization of biological sequences and cell chassis environments.

Key words: synthetic biology, intelligent design, biological element design, deep learning, intelligent optimization

中图分类号: