合成生物学 ›› 2023, Vol. 4 ›› Issue (3): 488-506.DOI: 10.12211/2096-8280.2022-078

• 特约评述 • 上一篇    下一篇

深度学习在蛋白质功能预测中的应用

宋益东, 袁乾沐, 杨跃东   

  1. 中山大学计算机学院,广东 广州 510000
  • 收稿日期:2022-12-31 修回日期:2023-03-07 出版日期:2023-06-30 发布日期:2023-07-05
  • 通讯作者: 杨跃东
  • 作者简介:宋益东(1998—),男,博士研究生。研究方向为蛋白质功能预测、蛋白质无序预测等。 E-mail:songyd6@mail2.sysu.edu.cn
    杨跃东(1980—),男,教授,博士生导师。研究方向为高性能生物信息计算,蛋白质结构与功能预测,智能药物设计,跨尺度多组学数据挖掘,及生物医药超算平台。 E-mail:yangyd25@mail.sysu.edu.cn
  • 基金资助:
    国家重点研发计划(2022YF1203100);国家自然科学基金(12126610)

Application of deep learning in protein function prediction

Yidong SONG, Qianmu YUAN, Yuedong YANG   

  1. School of Computer Science and Engineering,Sun Yat-Sen University,Guangzhou 510000,Guangdong,China
  • Received:2022-12-31 Revised:2023-03-07 Online:2023-06-30 Published:2023-07-05
  • Contact: Yuedong YANG

摘要:

蛋白质功能预测是生物信息学中的一项重要任务,在疾病机制的阐明和药物靶点发现等领域有着重要作用。因为传统的测定蛋白质功能的生化实验通常成本高、耗时长、通量低,所以开发出高效且准确的蛋白质功能预测计算方法十分重要。蛋白质功能预测可以分为残基水平的结合位点预测和蛋白水平的基因本体论(gene ontology, GO)预测。本文首先介绍该领域常用的数据库及蛋白质特征信息,接着对当下最新的蛋白质功能预测方法进行总结。在结合位点预测方面,根据配体类型分别介绍了最新的蛋白质-蛋白质、蛋白质-多肽、蛋白质-核酸和蛋白质-小分子或离子配体的结合位点预测方法;在GO预测方面,按照预测方法的类别分别介绍了最近的基于序列、基于结构和基于蛋白相互作用网络的方法。最后,对目前的蛋白质功能预测方法进行总结、分析优劣,并展望该领域未来的发展方向。

关键词: 深度学习, 蛋白质, 功能预测, 结合位点, 基因本体论

Abstract:

Protein function prediction is essential for bioinformatics analysis, which benefits a wide range of biological studies such as understanding the functions of metagenomes, uncovering mechanism underlying diseases, and finding new drug targets. With the rapid development of high-throughput sequencing technology, protein sequence data have been increased quickly, but functions of most proteins have not yet been identified. Since traditional biochemical experiments to determine protein functions are usually expensive, time-consuming, and less efficient, developing more efficient and effective computational methods for protein function prediction is of great significance. Deep learning technology has made breakthroughs in many fields, including image recognition, natural language processing, genomic analysis and drug discovery. In this review, we address applications of deep learning in protein function prediction, which can be divided into residue-level binding site prediction and protein-level gene ontology (GO) prediction. Protein binding sites are regions that bind to specific ligands, which play an important role in signal transduction, metabolism, revealing molecular mechanisms underlying diseases, and designing new drugs. Gene ontology is a standard function classification system for genes, which provides a set of annotations to comprehensively describe the properties of genes and gene products. Firstly, we introduce commonly used large-scale protein structure and function databases. Secondly, discriminative protein sequence and structure features are described. Thirdly, we summarize the latest protein function prediction methods: in terms of the prediction of binding sites, we introduce the latest methods based on the ligand type, including protein, peptide, nucleic acid and small molecule as well as ion ligand, and in the aspect of GO prediction, we highlight the latest sequence-based, structure-based, and protein interaction network-based methods developed with protein information. Finally, we comment the advantages and disadvantages of the current protein function prediction methods, and discuss the future development in this field.

Key words: deep learning, protein, function prediction, binding site, gene ontology

中图分类号: