合成生物学 ›› 2024, Vol. 5 ›› Issue (1): 88-106.DOI: 10.12211/2096-8280.2023-074

• 特约评述 • 上一篇    下一篇

深度学习在基于序列的蛋白质互作预测中的应用进展

朱景勇1,2,3, 李钧翔3,4, 李旭辉3,5, 张瑾2, 毋文静2   

  1. 1.浙江理工大学生命科学与医药学院,浙江 杭州 310018
    2.嘉兴学院生物与化学工程学院,浙江 嘉兴 314000
    3.浙江清华长三角研究院,衰老科学创新研发中心,浙江 嘉兴 341001
    4.禾美生物科技(浙江)有限公司,浙江 嘉兴 341001
    5.浙江清华长三角研究院,浙江省应用酶学重点实验室,浙江 嘉兴 314006
  • 收稿日期:2023-10-24 修回日期:2023-11-28 出版日期:2024-02-29 发布日期:2024-03-20
  • 通讯作者: 张瑾
  • 作者简介:朱景勇(1998—),男,硕士研究生。研究方向为深度学习预测蛋白质互作。 E-mail:jingyongzhu2016@163.com
    张瑾(1976—),男,教授,硕士生导师。研究方向为代谢疾病的致病机理及药物靶点发掘。 E-mail:zhangjin7688@163.com
  • 基金资助:
    国家自然科学基金(32172708);浙江省自然科学基金重点项目(LZ23C170002)

Advances in applications of deep learning for predicting sequence-based protein interactions

Jingyong ZHU1,2,3, Junxiang LI3,4, Xuhui LI3,5, Jin ZHANG2, Wenjing WU2   

  1. 1.College of Life Sciences and Medicine,Zhejiang Sci-tech University,Hangzhou 310018,Zhejiang,China
    2.College of Biological Chemical Sciences and Engineering,Jiaxing University,Jiaxing 314000,Zhejiang,China
    3.Agecode R&D Center,Yangtze Delta Region Institute of Tsinghua University,Jiaxing 341001,Zhejiang,China
    4.Harvest Biotech. Co. ,Ltd. ,Jiaxing 341001,Zhejiang,China
    5.Zhejiang Provincial Key Laboratory of Applied Enzymology,Yangtze Delta Region Institute of Tsinghua University,Jiaxing 314006,Zhejiang,China
  • Received:2023-10-24 Revised:2023-11-28 Online:2024-02-29 Published:2024-03-20
  • Contact: Jin ZHANG

摘要:

蛋白质-蛋白质相互作用在细胞信号转导、基因表达和代谢调控等生物过程中发挥重要作用,鉴定蛋白质间的相互作用对于理解复杂生物过程至关重要。预测蛋白质间的相互作用可以为药物发现、蛋白质功能研究和设计等领域提供帮助。近年来,随着人工智能技术的蓬勃发展,深度学习技术在预测蛋白质互作领域做出巨大贡献,其中基于序列的深度学习模型通过学习蛋白质序列信息的深层特征进行互作预测。本文综述了深度学习在基于序列的蛋白质互作预测中的应用,按照算法框架和时间线对该领域进展进行分类归纳,介绍了数据处理、序列编码方法、算法架构以及模型的评估指标等内容,并分析了当前面临的问题以及未来的发展方向。随着深度学习技术的发展,预测蛋白质互作的效率大幅提高,未来需要发展泛化能力更强的预测模型,助力蛋白质互作的预测。

关键词: 蛋白质互作, 深度学习, 人工智能, 序列编码, 神经网络

Abstract:

Protein-protein interactions play a crucial role in biological processes such as cell signal transduction, gene expression and metabolic regulation, and thus their identification is essential for understanding these complex biological processes. Predicting protein-protein interactions is a hot topic of great significance, which can provide assistances in areas such as drug discovery and protein function research and design as well. In recent years, with the development of artificial intelligence, machine learning technologies have been applied gradually to the prediction of protein-protein interactions, which has shown good potentials. However, when processing a large amount of protein information, traditional machine learning methods are difficult to mine the intrinsic patterns and potential features, and deep learning techniques are needed. Compared with the three-dimensional structure of proteins, sequence information is easier to obtain, and the development of high-throughput sequencing technology provides abundant protein sequence information, which greatly facilitates the development of sequence-based deep learning technologies. Sequence-based deep learning models predict protein-protein interactions by learning intrinsic patterns and features from protein sequence information, which greatly improves prediction efficiency and accuracy. In this review, we focus on progress of deep learning in predicting sequence-based protein interactions, categorize, which is summarized according to the algorithmic framework and timeline, briefly describing the construction methods of datasets and the evaluation metrics of the models, discussing in detail the sequence encoding methods and common algorithmic architectures, and demonstrating the computational models based on various types of algorithms and their features and advantages. Finally, we analyze current challenges in predicting protein-protein interactions using deep learning methods, and discuss possible solutions. With the development of deep learning technology, the efficiency of predicting protein-protein interactions has increased dramatically. As a result, there is a need to develop models with stronger generalization and more robust prediction capabilities to aid the prediction of protein-protein interactions in the future.

Key words: protein interactions, deep learning, artificial intelligence, sequence encoding, neural network

中图分类号: