合成生物学 ›› 2023, Vol. 4 ›› Issue (3): 507-523.DOI: 10.12211/2096-8280.2022-079

• 特约评述 • 上一篇    下一篇

蛋白质复合物结构预测:方法与进展

黄鹤1, 吴桐1, 王闻达1, 李佳珊1, 孙黛雯1, 叶启威2, 龚新奇1,2   

  1. 1.中国人民大学数学科学研究院,北京 100872
    2.北京智源人工智能研究院,北京 100084
  • 收稿日期:2022-12-31 修回日期:2023-03-20 出版日期:2023-06-30 发布日期:2023-07-05
  • 通讯作者: 龚新奇
  • 作者简介:黄鹤(1995—),男,博士研究生。研究方向为蛋白质复合物结构预测算法等。 E-mail:hehuang@ruc.edu.cn
    龚新奇(1978—),男,教授,博士生导师。龚新奇课题组(数学智能应用实验室)在生物信息学方向构建数学模型、开发计算方法和并应用于研究多聚体超大蛋白质相互作用复合物的结构、网络和动力学等;在机器学习方向利用深度学习和大数据方法设计新的算法框架解决生物大分子和医疗图像的计算。 E-mail:xinqigong@ruc.edu.cn

Prediction of protein complex structure: methods and progress

He HUANG1, Tong WU1, Wenda WANG1, Jiashan LI1, Daiwen SUN1, Qiwei YE2, Xinqi GONG1,2   

  1. 1.Institute for Mathematical Sciences,Renmin University of China,Beijing 100872,China
    2.Beijing Academy of Artificial Intelligence,Beijing 100084,China
  • Received:2022-12-31 Revised:2023-03-20 Online:2023-06-30 Published:2023-07-05
  • Contact: Xinqi GONG

摘要:

蛋白质复合物是不同蛋白质链通过相互作用形成的,自然界中很多蛋白质通过形成复合物而执行功能,因此准确地预测复合物的结构对于理解和掌握功能至关重要。近两年来,单条蛋白质链的结构预测有了突破性的进展,从氨基酸序列出发预测蛋白质结构的水平大幅提高。但相较于单体蛋白质,蛋白质复合物结构预测的准确性仍然较低。本文旨在总结蛋白质复合物结构预测的相关算法以及介绍最新进展。首先简要介绍蛋白质结构预测领域的相关人工智能算法,主要包括共进化分析与蛋白质接触预测、深度学习方法与蛋白质结构预测、预训练模型与蛋白质表征学习几个方面;其次系统总结了蛋白质复合物链间相互作用预测的基本方法,从复合物的多重序列比对构建到对于同源或异源复合物的链间残基接触预测;最后从相互作用位点指导复合物结构预测、蛋白质分子对接算法、端到端的复合物结构预测方法等方面阐述了蛋白质复合物结构预测的基本方法和思路。总体来说,目前蛋白质复合物结构预测精度不够高,有效地解决多重序列比对的配对和多聚体复合物模板搜索等问题,或者在大量的序列或结构数据上结合预训练模型的新范式,是一个合理而有效的方案。提升蛋白质复合物结构预测水平在合成生物学领域如抗体设计、药物发现等方面有很好的应用前景。

关键词: 蛋白质复合物, 蛋白质相互作用, 蛋白质链间接触预测, 蛋白质分子对接, 结构预测

Abstract:

Protein complexes carry out a variety of biological functions, and obtaining the three-dimensional structure of protein complexes is critical for understanding their functions. In many cases, not only can two proteins interact to form a protein dimer, but also multiple proteins interact to form a protein multimer. It is difficult and time-consuming to resolve the structure of protein complexes by experiments. Recently, there have been some attempts and methods to predict the structure of multimers based on the structure prediction for the monomers. Several groups in the CASP14 competition submitted the prediction of protein complex targets, which mainly included template -based methods or protein docking. Later, on the basis of AlphaFold2, researchers developed some end-to-end structure prediction methods for complexes, which accelerates the study of protein complex structure prediction. However, compared with the prediction of monomeric protein structure, the accuracy of prediction for protein complex structure is still lower. This review surveys updated methods and advances in protein complex prediction, including inter-chain residue contact prediction, protein docking, and end-to-end protein complex structure prediction. Firstly, AI algorithms for protein structure prediction are briefly introduced, including coevolutionary analysis and protein contact prediction, deep learning method and protein structure prediction, pretraining model, and protein representation learning. Secondly, basic methods for predicting interactions between protein complexes are systematically summarized, from the construction of multiple sequence alignments of the complexes to the prediction of the inter-residue contact between chains of homologous or heterologous complexes. Finally, basic methods and ideas for protein complex structure prediction are explored from the viewpoint of interaction sites guiding complex structure prediction, protein molecular docking algorithm, end-to-end complex structure prediction methods, etc. In order to better predict the structure of protein complexes, we need to devote our effort to following aspects: 1) constructing protein complexes datasets for training and evaluation of prediction methods for the structure of multimers, 2) developing efficient algorithms to improve the prediction accuracy such as MSA paring algorithm and building templates for multi-chain protein complex, and 3) enlarging databases for protein sequences and structures for better modeling protein complex with pretraining and self-supervised learning methods. In all, predicting protein complex structure still remains a challenge, and new methods to improve accuracy will be helpful for analyzing protein functions, designing proteins and drug discovery.

Key words: protein complex, protein-protein interaction, inter-chain contact prediction, protein docking, structure prediction

中图分类号: