Synthetic Biology Journal ›› 2023, Vol. 4 ›› Issue (3): 464-487.DOI: 10.12211/2096-8280.2023-008

• Invited Review • Previous Articles     Next Articles

Research progress of artificial intelligence in desiging protein structures

Zhihang CHEN, Menglin JI, Yifei QI   

  1. School of Pharmacy,Fudan University,Shanghai 201203,China
  • Received:2023-01-13 Revised:2023-03-15 Online:2023-07-05 Published:2023-06-30
  • Contact: Yifei QI

人工智能蛋白质结构设计算法研究进展

陈志航, 季梦麟, 戚逸飞   

  1. 复旦大学药学院,上海 201203
  • 通讯作者: 戚逸飞
  • 作者简介:陈志航(1998—),男,硕士研究生。研究方向为人工智能蛋白质设计。 E-mail:zhihangchen21@m.fudan.edu.cn
    季梦麟(2000—),男,硕士研究生。研究方向为人工智能蛋白质设计。 E-mail:22211030067@m.fudan.edu.cn
    戚逸飞(1983—),男,副研究员,硕士生导师。研究方向为生物大分子结构和功能模拟以及人工智能药物设计。 E-mail:yfqi@fudan.edu.cn
  • 基金资助:
    国家自然科学基金(22033001)

Abstract:

Proteins are essential to life as they carry out a great variety of biological functions. Protein sequences determine their three-dimensional structures, and therefore physiological functions. Proteins with specific functions have important applications in many fields such as biomedicine, where they are utilized in drug design and delivery. In the past, protein engineering and directed evolution are commonly used to improve the activity and stability of proteins. These methods, however, are both complex and expensive, as they require a large number of biological experiments for validation. Computational protein design (CPD) allows the design of amino acid sequences based on desired protein functions and structures, and more intriguingly, generation of proteins even not found in nature. Conventional CPD uses energy function and optimization algorithm to design protein sequences. In recent years, with the rapid development of artificial intelligence (AI) technique, the accumulation of big data and the development of high speed computing, AI has made great progresses in learning, and been successfully applied in CPD. In this review, based on the input constraints and sampling space size, we present a systematic overview of recent applications of AI in protein design from three aspects: fixed-backbone design, flexible-backbone design, and sequence structure generation. We focus on algorithms and protein feature encoding, present the effect of dataset size and architectural improvements on model performance in prediction, and showcase several enzymes, antibodies, and binding proteins that were successfully designed using these models. The advantages of AI compared with traditional CPD methods are also discussed. Finally, we highlight challenges in AI-aided protein design, and propose some strategies for solutions.

Key words: protein design, protein engineering, artificial intelligence, deep learning, protein sequence and structure

摘要:

蛋白质是各类生命活动不可缺少的承担者,其序列决定了折叠后的三维结构和功能。这些具有特定功能的蛋白质在生物医学等多个领域具有重要的应用价值。计算蛋白质设计可以根据所需的蛋白功能和结构设计氨基酸序列,生成自然界中不存在的蛋白质。传统计算蛋白质设计通常采用能量函数和特定的搜索优化算法获得设计的序列。近年来,随着先进算法的发展、大数据的积累和计算机硬件算力的增长,人工智能技术得到了蓬勃发展,并逐渐应用于蛋白质设计领域。本文综述了近年人工智能在蛋白质结构设计中的进展,侧重于各类算法的介绍,从固定骨架设计、可变骨架设计和序列结构生成三个方面回顾了最新的蛋白质结构设计算法,并阐明了其相对于传统计算方法的新颖性和创新性。在人工智能技术的赋能下,蛋白质设计的成功率和合理性获得大幅提高,按需功能蛋白设计的时代即将到来。

关键词: 蛋白质设计, 蛋白质工程, 人工智能, 深度学习, 蛋白质序列与结构

CLC Number: