合成生物学 ›› 2024, Vol. 5 ›› Issue (1): 88-106.DOI: 10.12211/2096-8280.2023-074
朱景勇1,2,3, 李钧翔3,4, 李旭辉3,5, 张瑾2, 毋文静2
收稿日期:
2023-10-24
修回日期:
2023-11-28
出版日期:
2024-02-29
发布日期:
2024-03-20
通讯作者:
张瑾
作者简介:
基金资助:
Jingyong ZHU1,2,3, Junxiang LI3,4, Xuhui LI3,5, Jin ZHANG2, Wenjing WU2
Received:
2023-10-24
Revised:
2023-11-28
Online:
2024-02-29
Published:
2024-03-20
Contact:
Jin ZHANG
摘要:
蛋白质-蛋白质相互作用在细胞信号转导、基因表达和代谢调控等生物过程中发挥重要作用,鉴定蛋白质间的相互作用对于理解复杂生物过程至关重要。预测蛋白质间的相互作用可以为药物发现、蛋白质功能研究和设计等领域提供帮助。近年来,随着人工智能技术的蓬勃发展,深度学习技术在预测蛋白质互作领域做出巨大贡献,其中基于序列的深度学习模型通过学习蛋白质序列信息的深层特征进行互作预测。本文综述了深度学习在基于序列的蛋白质互作预测中的应用,按照算法框架和时间线对该领域进展进行分类归纳,介绍了数据处理、序列编码方法、算法架构以及模型的评估指标等内容,并分析了当前面临的问题以及未来的发展方向。随着深度学习技术的发展,预测蛋白质互作的效率大幅提高,未来需要发展泛化能力更强的预测模型,助力蛋白质互作的预测。
中图分类号:
朱景勇, 李钧翔, 李旭辉, 张瑾, 毋文静. 深度学习在基于序列的蛋白质互作预测中的应用进展[J]. 合成生物学, 2024, 5(1): 88-106.
Jingyong ZHU, Junxiang LI, Xuhui LI, Jin ZHANG, Wenjing WU. Advances in applications of deep learning for predicting sequence-based protein interactions[J]. Synthetic Biology Journal, 2024, 5(1): 88-106.
数据库名称Database name | 简介 Description | 链接 URL | 最近更新 Last update | 参考文献Reference |
---|---|---|---|---|
BioGRID | 以蛋白质为核心的互作数据库 | https://thebiogrid.org | 2023 | [ |
DIP | 通过实验验证和文献确认的PPI互作信息 | https://dip.doe-mbi.ucla.edu/dip | 2020 | [ |
HIPPIE | 人工整合的超过60 000条PPI互作数据 | https://cbdm.uni-mainz.de/hippie | 2023 | [ |
HPRD | 人类PPI数据库,包括41 327对互作信息 | https://hprd.org | 2010 | [ |
HVIDB | HVIDB重点介绍了48 643个经过实验验证的人与病毒PPI | http://zzdlab.com/hvidb/ | 2020 | [ |
Intact | 从22 954份文献中提取1 194 594份互作数据信息 | https://www.ebi.ac.uk/intact/home | 2022 | [ |
Negatome | 通过整理文献和分析蛋白质复合物的三维结构得到的非相互作用信息 | http://mips.helmholtz-muenchen.de/proj/ppi/negatome | 2014 | [ |
STRING | 蛋白质互作数据库,涉及14 094种生物的67 592 464个蛋白质 | https://cn.string-db.org | 2023 | [ |
表1 常用的数据库以及基本信息
Table 1 Public databases and basic information
数据库名称Database name | 简介 Description | 链接 URL | 最近更新 Last update | 参考文献Reference |
---|---|---|---|---|
BioGRID | 以蛋白质为核心的互作数据库 | https://thebiogrid.org | 2023 | [ |
DIP | 通过实验验证和文献确认的PPI互作信息 | https://dip.doe-mbi.ucla.edu/dip | 2020 | [ |
HIPPIE | 人工整合的超过60 000条PPI互作数据 | https://cbdm.uni-mainz.de/hippie | 2023 | [ |
HPRD | 人类PPI数据库,包括41 327对互作信息 | https://hprd.org | 2010 | [ |
HVIDB | HVIDB重点介绍了48 643个经过实验验证的人与病毒PPI | http://zzdlab.com/hvidb/ | 2020 | [ |
Intact | 从22 954份文献中提取1 194 594份互作数据信息 | https://www.ebi.ac.uk/intact/home | 2022 | [ |
Negatome | 通过整理文献和分析蛋白质复合物的三维结构得到的非相互作用信息 | http://mips.helmholtz-muenchen.de/proj/ppi/negatome | 2014 | [ |
STRING | 蛋白质互作数据库,涉及14 094种生物的67 592 464个蛋白质 | https://cn.string-db.org | 2023 | [ |
类型 Type | 编码方法 Encoding method | 形式 Form | 向量长度 Vector length | 参考文献 Reference |
---|---|---|---|---|
基于序列成分 | 二肽组成 dipeptide composition | 一维向量 | 400 | [ |
氨基酸组成 amino acid composition | 一维向量 | 20 | [ | |
伪氨基酸组成 pseudo-amino acid composition | 一维向量 | 20+L,L为最大滞后值 | [ | |
联合三元组 conjoint triad | 一维向量 | 343 | [ | |
基于自相关 | 自协方差 auto covariance | 一维向量 | L×理化性质个数,L为最大滞后值 | [ |
交叉协方差 cross covariance | 一维向量 | L×N×(N-1),L为最大滞后值,N为理化性质个数 | [ | |
自交叉协方差 auto-cross covariance | 一维向量 | L×N×N,L为最大滞后值,N为理化性质个数 | [ | |
Moran自相关 Moran autocorrelation | 一维向量 | L×理化性质个数,L为最大滞后值 | [ | |
Geary自相关 Geary autocorrelation | 一维向量 | L×理化性质个数,L为最大滞后值 | [ | |
物理化学距离变换 physicochemical distance transformation | 一维向量 | 531×β,β为最大间隔距离 | [ | |
基于进化信息 | 位置特异性得分矩阵PSSM | 二维矩阵 | 20×N,N为序列长度 | [ |
ACC-PSSM | 一维向量 | 400×L,L为最大滞后值 | [ | |
Top-n-grams | 一维向量 | L×N,L为序列长度,N为阈值 | [ | |
BLOSUM62 | 二维矩阵 | 20×N,N为序列长度 | [ | |
伪位置特异性得分矩阵Pseudo-PSSM | 一维向量 | 40 | [ |
表2 基于序列的PPI预测模型中常见的蛋白质编码方法
Table 2 Protein encoding methods in sequence-based PPI prediction models
类型 Type | 编码方法 Encoding method | 形式 Form | 向量长度 Vector length | 参考文献 Reference |
---|---|---|---|---|
基于序列成分 | 二肽组成 dipeptide composition | 一维向量 | 400 | [ |
氨基酸组成 amino acid composition | 一维向量 | 20 | [ | |
伪氨基酸组成 pseudo-amino acid composition | 一维向量 | 20+L,L为最大滞后值 | [ | |
联合三元组 conjoint triad | 一维向量 | 343 | [ | |
基于自相关 | 自协方差 auto covariance | 一维向量 | L×理化性质个数,L为最大滞后值 | [ |
交叉协方差 cross covariance | 一维向量 | L×N×(N-1),L为最大滞后值,N为理化性质个数 | [ | |
自交叉协方差 auto-cross covariance | 一维向量 | L×N×N,L为最大滞后值,N为理化性质个数 | [ | |
Moran自相关 Moran autocorrelation | 一维向量 | L×理化性质个数,L为最大滞后值 | [ | |
Geary自相关 Geary autocorrelation | 一维向量 | L×理化性质个数,L为最大滞后值 | [ | |
物理化学距离变换 physicochemical distance transformation | 一维向量 | 531×β,β为最大间隔距离 | [ | |
基于进化信息 | 位置特异性得分矩阵PSSM | 二维矩阵 | 20×N,N为序列长度 | [ |
ACC-PSSM | 一维向量 | 400×L,L为最大滞后值 | [ | |
Top-n-grams | 一维向量 | L×N,L为序列长度,N为阈值 | [ | |
BLOSUM62 | 二维矩阵 | 20×N,N为序列长度 | [ | |
伪位置特异性得分矩阵Pseudo-PSSM | 一维向量 | 40 | [ |
类型 Type | 发表时间 Published time | 模型名称 Model name | 算法框架 Algorithm framework | 预测类型 Prediction type | 参考文献 Reference |
---|---|---|---|---|---|
基于DNN | 2018 | — | DNN | PPI | [ |
2019 | EnsDNN | DNN | PPI | [ | |
2019 | — | DNN | PPI | [ | |
2019 | — | DNN | PPI | [ | |
2019 | DeepFEPPI | DNN | PPI | [ | |
2019 | DNN-PPI | DNN | PPI | [ | |
2020 | — | DNN | PPI | [ | |
2020 | — | DNN | PPI | [ | |
2022 | DNN-XGB | DNN,XGB | PPI | [ | |
2022 | DWPPI | DNN | PPI | [ | |
2022 | — | DNN | PPI | [ | |
2022 | CT-DNN | DNN | PPI | [ | |
2022 | — | DNN | PPI | [ | |
基于CNN | 2018 | DPPI | CNN | PPI | [ |
2019 | — | CNN-RF | PPI | [ | |
2019 | — | CNN | PPI | [ | |
2020 | Visual | CNN | PPIsite | [ | |
2020 | DeepPPISP | TextCNN | PPIsite | [ | |
2020 | EnAmDNN | CNN | PPI | [ | |
2020 | — | CNN | PPIsite | [ | |
2021 | TransPPI | CNN | PPI | [ | |
2021 | — | CNN | PPI | [ | |
2021 | D-script | CNN | PPI | [ | |
2021 | CAMP | CNN | PepPI | [ | |
2021 | DeepViral | CNN | PPI | [ | |
2022 | DeepTrio | CNN | PPI | [ | |
2023 | EResCNN | ResCNN+RF | PPI | [ | |
2023 | ProtInteract | CNN | PPI | [ | |
基于RNN及变体 | 2019 | — | RNN | PPI | [ |
2019 | DLPred | LSTM | PPIsite | [ | |
2019 | — | LSTM | PPI | [ | |
2020 | — | LSTM | PPI | [ | |
2021 | — | GRNN | PPI | [ | |
2021 | LSTM-PHV | LSTM | PPI | [ | |
2023 | — | RNN | PPI | [ | |
基于注意力机制和Transformer | 2021 | HANPPIS | Stratified attention | PPIsite | [ |
2022 | Cross-attention PHV | Cross-attention | PPI | [ | |
2022 | ADH-PPI | Self-attention | PPI | [ | |
2022 | SDNN | Self-attention | PPI | [ | |
2023 | EnsemPPIS | Transformer | PPIsite | [ | |
基于混合网络 | 2018 | DNN-PPI | CNN+LSTM | PPI | [ |
2019 | PIPR | RNN+CNN | PPI | [ | |
2019 | IPPI | DNN+LSTM | PPI | [ | |
2021 | DELPHI | RNN+CNN | PPIsite | [ | |
2021 | OR-RCNN | RNN+CNN | PPI | [ | |
2023 | MM-StackEns | SNN+GNN | PPI | [ |
表3 近5年相关的计算模型
Table 3 Computational models developed within the past 5 years
类型 Type | 发表时间 Published time | 模型名称 Model name | 算法框架 Algorithm framework | 预测类型 Prediction type | 参考文献 Reference |
---|---|---|---|---|---|
基于DNN | 2018 | — | DNN | PPI | [ |
2019 | EnsDNN | DNN | PPI | [ | |
2019 | — | DNN | PPI | [ | |
2019 | — | DNN | PPI | [ | |
2019 | DeepFEPPI | DNN | PPI | [ | |
2019 | DNN-PPI | DNN | PPI | [ | |
2020 | — | DNN | PPI | [ | |
2020 | — | DNN | PPI | [ | |
2022 | DNN-XGB | DNN,XGB | PPI | [ | |
2022 | DWPPI | DNN | PPI | [ | |
2022 | — | DNN | PPI | [ | |
2022 | CT-DNN | DNN | PPI | [ | |
2022 | — | DNN | PPI | [ | |
基于CNN | 2018 | DPPI | CNN | PPI | [ |
2019 | — | CNN-RF | PPI | [ | |
2019 | — | CNN | PPI | [ | |
2020 | Visual | CNN | PPIsite | [ | |
2020 | DeepPPISP | TextCNN | PPIsite | [ | |
2020 | EnAmDNN | CNN | PPI | [ | |
2020 | — | CNN | PPIsite | [ | |
2021 | TransPPI | CNN | PPI | [ | |
2021 | — | CNN | PPI | [ | |
2021 | D-script | CNN | PPI | [ | |
2021 | CAMP | CNN | PepPI | [ | |
2021 | DeepViral | CNN | PPI | [ | |
2022 | DeepTrio | CNN | PPI | [ | |
2023 | EResCNN | ResCNN+RF | PPI | [ | |
2023 | ProtInteract | CNN | PPI | [ | |
基于RNN及变体 | 2019 | — | RNN | PPI | [ |
2019 | DLPred | LSTM | PPIsite | [ | |
2019 | — | LSTM | PPI | [ | |
2020 | — | LSTM | PPI | [ | |
2021 | — | GRNN | PPI | [ | |
2021 | LSTM-PHV | LSTM | PPI | [ | |
2023 | — | RNN | PPI | [ | |
基于注意力机制和Transformer | 2021 | HANPPIS | Stratified attention | PPIsite | [ |
2022 | Cross-attention PHV | Cross-attention | PPI | [ | |
2022 | ADH-PPI | Self-attention | PPI | [ | |
2022 | SDNN | Self-attention | PPI | [ | |
2023 | EnsemPPIS | Transformer | PPIsite | [ | |
基于混合网络 | 2018 | DNN-PPI | CNN+LSTM | PPI | [ |
2019 | PIPR | RNN+CNN | PPI | [ | |
2019 | IPPI | DNN+LSTM | PPI | [ | |
2021 | DELPHI | RNN+CNN | PPIsite | [ | |
2021 | OR-RCNN | RNN+CNN | PPI | [ | |
2023 | MM-StackEns | SNN+GNN | PPI | [ |
图3 DNN、RNN、CNN的基本网络结构(a)DNN包含输入层,多个隐藏层,输出层;(b)RNN包括一个输入单元Input、一个隐藏单元Hidden和一个输出单元Output。将RNN按时间展开,符号的下标代表时间,意味着H t 接收来自I t 和H t-1的输入,然后将计算结果传播给O t 和H t+1;(c)CNN包括输入层,多个卷积层和池化层以及全连接层和输出层
Fig. 3 Basic network structure of DNN, RNN, and CNN(a) DNN contains an input layer, multiple hidden layers, and an output layer; (b) RNN consists of an input unit (Input), a hidden unit (Hidden), and an output unit (Output) to unfold the RNN in time, with the subscript of the symbols representing the time, indicating that H t receives the inputs from I t and H t-1, and then propagates the results of the computation to O t and H t+1; (c) CNNs include an input layer, multiple convolutional and pooling layers, and fully connected and output layers
图4 注意力机制和Transformer的基本结构(a)详细描绘了注意力机制的核心操作,包括Query、Key、与Key对应的Value。首先通过F(Q,K)计算每一个Query和Key的相似性得分(S1~S4)。这些得分经过Softmax函数归一化得到每个Key的权重(A1~A4)。每个Key有一个与之对应的“值”(Value1~Value4),将每个Value与其相应的权重加权求和最终得到Attention value。(b)引用自文献[72],展示了Transformer模型的核心架构,分为编码器和解码器两部分。左侧的编码器首先接受输入并通过“输入嵌入”与“位置编码”进行预处理,然后多次经过包含“多头注意力”机制和前馈神经网络的结构单元,目标是将输入序列转化为一个上下文丰富的连续向量表示。右侧的解码器则负责根据编码器提供的上下文信息生成输出序列。它的输入初步为“Outputs (shifted right)”,确保在解码过程中当前位置的输出仅依赖于前面的信息。解码器同样经过多次的“多头注意力”和前馈网络处理,最终通过线性变换和Softmax层得到输出概率分布,代表各个可能输出的概率
Fig. 4 Basic structure of the basic structure of Attention mechanism and Transformer(a) Providing a detailed depiction of the core operations in the attention mechanism, such as Query, Key, and corresponding value for each Key. The similarity scores (S1 to S4) between each Query and Key are first computed using the function F (Q, K), which are then normalized through the Softmax function to derive weights for each Key (A1 to A4). Every Key is associated with a "value" (Value 1 to Value 4). The final attention value is derived by summing up the product of each value and its respective weight. (b) Adapted from reference [72], showcasing the fundamental architecture of the transformer model that is split into encoder and decoder components. The encoder on the left initially receives inputs and pre-processes them through "input embedding" and "positional encoding", and then repeatedly passes them through structural units containing the "multi-head attention" mechanism and feed-forward neural networks. The objective is to transform the input sequence into a context-rich continuous vector representation. On the right, the decoder is responsible for producing an output sequence based on the context provided by the encoder. Its initial input is labeled as "Output", ensuring that the output at the current position in the decoding process only depends on prior information. The decoder also undergoes multiple rounds of "multi-head attention" and feed-forward network processing, ultimately yielding an output probability distribution via linear transformation and a Softmax layer, representing a probability for the potential output
预测结果 Prediction result | 英文名称 English name | 描述 Description |
---|---|---|
真阳性 | true positive, TP | 对阳性样本预测为阳性 |
假阳性 | false positive, FP | 对阴性样本预测为阳性 |
真阴性 | true negative, TN | 对阴性样本预测为阴性 |
假阴性 | false negative, FN | 对阳性样本预测为阴性 |
表4 四种基本预测结果及其具体含义
Table 4 Four basic prediction results and their specific meanings
预测结果 Prediction result | 英文名称 English name | 描述 Description |
---|---|---|
真阳性 | true positive, TP | 对阳性样本预测为阳性 |
假阳性 | false positive, FP | 对阴性样本预测为阳性 |
真阴性 | true negative, TN | 对阴性样本预测为阴性 |
假阴性 | false negative, FN | 对阳性样本预测为阴性 |
评估指标 Evaluation index | 计算公式 Calculation formula |
---|---|
准确率(accuracy) | |
精准率(precision) | |
敏感性(sensitivity) | |
特异性(specificity) | |
F1值(F1 score) | |
马修斯相关系数(Matthews correlation coefficient) |
表5 六种评估指标及其计算公式
Table 5 Six evaluation metrics and their calculation formulas
评估指标 Evaluation index | 计算公式 Calculation formula |
---|---|
准确率(accuracy) | |
精准率(precision) | |
敏感性(sensitivity) | |
特异性(specificity) | |
F1值(F1 score) | |
马修斯相关系数(Matthews correlation coefficient) |
1 | HOSSEINI S, ILIE L. PITHIA: protein interaction site prediction using multiple sequence alignments and attention[J]. International Journal of Molecular Sciences, 2022, 23(21): 12814. |
2 | FIELDS S, STERNGLANZ R. The two-hybrid system: an assay for protein-protein interactions[J]. Trends in Genetics, 1994, 10(8): 286-292. |
3 | RIGAUT G, SHEVCHENKO A, RUTZ B, et al. A generic protein purification method for protein complex characterization and proteome exploration[J]. Nature Biotechnology, 1999, 17(10): 1030-1032. |
4 | ZHU H, BILGIN M, BANGHAM R, et al. Global analysis of protein activities using proteome chips[J]. Science, 2001, 293(5537): 2101-2105. |
5 | ZHAN X K, XIAO M, YOU Z H, et al. Predicting protein-protein interactions based on ensemble learning-based model from protein sequence[J]. Biology, 2022, 11(7): 995. |
6 | LEE H, DENG M H, SUN F Z, et al. An integrated approach to the prediction of domain-domain interactions[J]. BMC Bioinformatics, 2006, 7(1): 269. |
7 | HSIN LIU C, LI K C, YUAN S. Human protein-protein interaction prediction by a novel sequence-based co-evolution method: co-evolutionary divergence[J]. Bioinformatics, 2013, 29(1): 92-98. |
8 | KOVÁCS I A, LUCK K, SPIROHN K, et al. Network-based prediction of protein interactions[J]. Nature Communications, 2019, 10: 1240. |
9 | SMITH G R, STERNBERG M J E. Prediction of protein-protein interactions by docking methods[J]. Current Opinion in Structural Biology, 2002, 12(1): 28-35. |
10 | ZHOU J, CUI G Q, HU S D, et al. Graph neural networks: a review of methods and applications[J]. AI open, 2020, 1: 57-81. |
11 | PENG X Q, WANG J X, PENG W, et al. Protein-protein interactions: detection, reliability assessment and applications[J]. Briefings in Bioinformatics, 2017, 18(5): 798-819. |
12 | NORTHEY T C, BAREŠIĆ A, MARTIN A C R. IntPred: a structure-based predictor of protein-protein interaction sites[J]. Bioinformatics, 2018, 34(2): 223-229. |
13 | BARANWAL M, MAGNER A, SALDINGER J, et al. Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions[J]. BMC Bioinformatics, 2022, 23(1): 370. |
14 | LEE A C L, HARRIS J L, KHANNA K K, et al. A comprehensive review on current advances in peptide drug development and design[J]. International Journal of Molecular Sciences, 2019, 20(10): 2383. |
15 | WEI L Y, ZOU Q A. Recent progress in machine learning-based methods for protein fold recognition[J]. International Journal of Molecular Sciences, 2016, 17(12): 2118. |
16 | SHEN J W, ZHANG J, LUO X M, et al. Predicting protein-protein interactions based only on sequences information[J]. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(11): 4337-4341. |
17 | ZHOU C, YU H A, DING Y J, et al. Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree[J]. PLoS One, 2017, 12(8): e0181426. |
18 | LIN X T, CHEN X W. Heterogeneous data integration by tree-augmented naïve Bayes for protein-protein interactions prediction[J]. Proteomics, 2013, 13(2): 261-268. |
19 | LI J Q, YOU Z H, LI X, et al. PSPEL: In silico prediction of self-interacting proteins from amino acids sequences using ensemble learning[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14(5): 1165-1172. |
20 | BAI Q F, MA J, LIU S, et al. WADDAICA: a webserver for aiding protein drug design by artificial intelligence and classical algorithm[J]. Computational and Structural Biotechnology Journal, 2021, 19: 3573-3579. |
21 | CHAI J Y, ZENG H, LI A M, et al. Deep learning in computer vision: a critical review of emerging techniques and application scenarios[J]. Machine Learning with Applications, 2021, 6: 100134. |
22 | LOPEZ M M, KALITA J. Deep learning applied to NLP[EB/OL]. arXiv, 2017: 1703.03091[2023-10-01]. . |
23 | TANG B H, PAN Z X, YIN K, et al. Recent advances of deep learning in bioinformatics and computational biology[J]. Frontiers in Genetics, 2019, 10: 214. |
24 | CONSORTIUM T U. UniProt: a hub for protein information[J]. Nucleic Acids Research, 2015, 43(D1): D204-D212. |
25 | STARK C, BREITKREUTZ B J, REGULY T, et al. BioGRID: a general repository for interaction datasets[J]. Nucleic Acids Research, 2006, 34(): D535-D539. |
26 | KESHAVA PRASAD T S, GOEL R, KANDASAMY K, et al. Human protein reference database—2009 update[J]. Nucleic Acids Research, 2009, 37(): D767-D772. |
27 | XENARIOS I, SALWÍNSKI L, DUAN X J, et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions[J]. Nucleic Acids Research, 2002, 30(1): 303-305. |
28 | KANDEL D, MATIAS Y, UNGER R, et al. Shuffling biological sequences[J]. Discrete Applied Mathematics, 1996, 71(1/2/3): 171-185. |
29 | BLOHM P, FRISHMAN G, SMIALOWSKI P, et al. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis[J]. Nucleic Acids Research, 2014, 42(D1): D396-D400. |
30 | ALANIS-LOBATO G, ANDRADE-NAVARRO M A, SCHAEFER M H. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks[J]. Nucleic Acids Research, 2017, 45(D1): D408-D414. |
31 | YANG X D, LIAN X Y, FU C, et al. HVIDB: a comprehensive database for human-virus protein-protein interactions[J]. Briefings in Bioinformatics, 2021, 22(2): 832-844. |
32 | KERRIEN S, ARANDA B, BREUZA L, et al. The IntAct molecular interaction database in 2012[J]. Nucleic Acids Research, 2012, 40(D1): D841-D846. |
33 | SZKLARCZYK D, GABLE A L, LYON D, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets[J]. Nucleic Acids Research, 2019, 47(D1): D607-D613. |
34 | HE J A, WU Y L, PU X M, et al. A transfer-learning-based deep convolutional neural network for predicting leukemia-related phosphorylation sites from protein primary sequences[J]. International Journal of Molecular Sciences, 2022, 23(3): 1741. |
35 | LIU B, LIU F L, WANG X L, et al. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences[J]. Nucleic Acids Research, 2015, 43(W1): W65-W71. |
36 | BHASIN M, RAGHAVA G P S. Classification of nuclear receptors based on amino acid composition and dipeptide composition[J]. The Journal of Biological Chemistry, 2004, 279(22): 23262-23266. |
37 | ZHANG C T, CHOU K C. An optimization approach to predicting protein structural class from amino acid composition[J]. Protein Science, 1992, 1(3): 401-408. |
38 | CHOU K C. Prediction of protein cellular attributes using pseudo-amino acid composition[J]. Proteins: Structure, Function, and Bioinformatics, 2001, 43(3): 246-255. |
39 | GUO Y Z, YU L Z, WEN Z N, et al. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences[J]. Nucleic Acids Research, 2008, 36(9): 3025-3030. |
40 | XIAO X, WANG P, CHOU K C. iNR-PhysChem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix[J]. PLoS One, 2012, 7(2): e30869. |
41 | DONG Q W, ZHOU S G, GUAN J H. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation[J]. Bioinformatics, 2009, 25(20): 2655-2662. |
42 | FENG Z P, ZHANG C T. Prediction of membrane protein types based on the hydrophobic index of amino acids[J]. Journal of Protein Chemistry, 2000, 19(4): 269-275. |
43 | SOKAL R R, THOMSON B A. Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population[J]. American Journal of Physical Anthropology, 2006, 129(1): 121-131. |
44 | LIU B, WANG X L, CHEN Q C, et al. Using amino acid physicochemical distance transformation for fast protein remote homology detection[J]. PLoS One, 2012, 7(9): e46633. |
45 | ZAHIRI J, YAGHOUBI O, MOHAMMAD-NOORI M, et al. PPIevo: protein-protein interaction prediction from PSSM based evolutionary information[J]. Genomics, 2013, 102(4): 237-242. |
46 | LIU B, WU H, CHOU K C. Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences [J]. Natural Science, 2017, 9(4): 67. |
47 | LIU B, WANG X L, LIN L, et al. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis[J]. BMC Bioinformatics, 2008, 9: 510. |
48 | LI A, WANG L R, SHI Y Z, et al. Phosphorylation site prediction with a modified k-nearest neighbor algorithm and BLOSUM62 matrix[C/OL]//2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. January 17-18, 2006, Shanghai. IEEE, 2006: 6075-6078 [2023-10-01]. . |
49 | SHEN H B, CHOU K C. Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM[J]. Protein Engineering, Design & Selection, 2007, 20(11): 561-567. |
50 | YANG K K, WU Z, BEDBROOK C N, et al. Learned protein embeddings for machine learning[J]. Bioinformatics, 2018, 34(15): 2642-2648. |
51 | DUNKER A K, LAWSON J D, BROWN C J, et al. Intrinsically disordered protein[J]. Journal of Molecular Graphics and Modelling, 2001, 19(1): 26-59. |
52 | FONG J H, SHOEMAKER B A, GARBUZYNSKIY S O, et al. Intrinsic disorder in protein interactions: insights from a comprehensive structural analysis[J]. PLoS Computational Biology, 2009, 5(3): e1000316. |
53 | TOMPA P, FUXREITER M, OLDFIELD C J, et al. Close encounters of the third kind: disordered domains and the interactions of proteins[J]. BioEssays, 2009, 31(3): 328-335. |
54 | WEATHERITT R J, GIBSON T J. Linear motifs: lost in (pre)translation[J]. Trends in Biochemical Sciences, 2012, 37(8): 333-341. |
55 | DYSON H J, WRIGHT P E. Coupling of folding and binding for unstructured proteins[J]. Current Opinion in Structural Biology, 2002, 12(1): 54-60. |
56 | DUNKER A K, CORTESE M S, ROMERO P, et al. Flexible nets: the roles of intrinsic disorder in protein interaction networks[J]. The FEBS Journal, 2005, 272(20): 5129-5148. |
57 | LEI Y P, LI S Y, LIU Z Y, et al. A deep-learning framework for multi-level peptide-protein interaction prediction[J]. Nature Communications, 2021, 12: 5465. |
58 | MÉSZÁROS B, ERDŐS G, DOSZTÁNYI Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding[J]. Nucleic Acids Research, 2018, 46(W1): W329-W337. |
59 | BASU S, GSPONER J, KURGAN L. DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction[J]. Nucleic Acids Research, 2023, 51(W1): W141-W147. |
60 | DEL CONTE A, BOUHRAOUA A, MEHDIABADI M, et al. CAID prediction portal: a comprehensive service for predicting intrinsic disorder and binding regions in proteins[J]. Nucleic Acids Research, 2023, 51(W1): W62-W69. |
61 | HANSON J, PALIWAL K K, LITFIN T, et al. SPOT-Disorder2: improved protein intrinsic disorder prediction by ensembled deep learning[J]. Genomics, Proteomics & Bioinformatics, 2019, 17(6): 645-656. |
62 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C/OL]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 7132-7141 [2023-10-01]. . |
63 | GRAVES A. Long short-term memory[M/OL]// Supervised sequence labelling with recurrent neural networks. Berlin, Heidelberg: Springer, 2012: 37-45 [2023-10-01]. . |
64 | KAWASHIMA S, POKAROWSKI P, POKAROWSKA M, et al. AAindex: amino acid index database, progress report 2008[J]. Nucleic Acids Research, 2008, 36(): D202-D205. |
65 | ALTSCHUL S F, MADDEN T L, SCHÄFFER A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402. |
66 | EDDY S R. Where did the BLOSUM62 alignment score matrix come from?[J]. Nature Biotechnology, 2004, 22(8): 1035-1036. |
67 | CHOU K C, SHEN H B. MemType-2L: a Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM[J]. Biochemical and Biophysical Research Communications, 2007, 360(2): 339-345. |
68 | SUN T L, ZHOU B, LAI L H, et al. Sequence-based prediction of protein protein interaction using a deep-learning algorithm[J]. BMC Bioinformatics, 2017, 18(1): 1-8. |
69 | MIIKKULAINEN R, LIANG J, MEYERSON E, et al. Evolving deep neural networks[M/OL]//Artificial intelligence in the age of neural networks and brain computing. Amsterdam: Elsevier. 2019: 293-312 [2023-10-01]. . |
70 | GU J X, WANG Z H, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377. |
71 | SHERSTINSKY A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica D: Nonlinear Phenomena, 2020, 404: 132306. |
72 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C/OL]//Proceedings of the 31st International Conference on Neural Information Processing Systems. December 4-9, 2017, Long Beach, California, USA. New York: ACM, 2017: 6000-6010 [2023-10-01]. . |
73 | GUI Y M, WANG R J, WEI Y Y, et al. Construction of protein-protein interactions model by deep neural networks[C/OL]//Proceedings of the 2018 International Workshop on Bioinformatics, Biochemistry, Biomedical Sciences (BBBS 2018). April 14-15, 2018. Hangzhou City, China. Paris, France: Atlantis Press, 2018: 221-229 [2023-10-01]. . |
74 | ZHANG L, YU G X, XIA D W, et al. Protein-protein interactions prediction based on ensemble deep neural networks[J]. Neurocomputing, 2019, 324: 10-19. |
75 | WANG X E, WU Y J, WANG R J, et al. A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences[J]. PLoS One, 2019, 14(6): e0217312. |
76 | WANG X, WANG R J, WEI Y Y, et al. A novel conjoint triad auto covariance (CTAC) coding method for predicting protein-protein interaction based on amino acid sequence[J]. Mathematical Biosciences, 2019, 313: 41-47. |
77 | YAO Y, DU X Q, DIAO Y Y, et al. An integration of deep learning with feature embedding for protein-protein interaction prediction[J]. PeerJ, 2019, 7: e7126. |
78 | GUI Y M, WANG R J, WEI Y Y, et al. DNN-PPI: a large-scale prediction of protein-protein interactions based on deep neural networks[J]. Journal of Biological Systems, 2019, 27(1): 1-18. |
79 | HANGGARA F S, ANAM K. Sequence-based protein-protein interaction prediction using greedy layer-wise training of deep neural networks[C/OL]//AIP Conference Proceedings. Novosibirsk, Russia: AIP Publishing, 2020, 2278(1): 020050 [2023-10-01]. . |
80 | GUI Y M, WANG R J, WANG X E, et al. Using deep neural networks to improve the performance of protein-protein interactions prediction[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2020, 34(13): 2052012. |
81 | MAHAPATRA S, GUPTA V R, SAHU S S, et al. Deep neural network and extreme gradient boosting based hybrid classifier for improved prediction of protein-protein interaction[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 19(1): 155-165. |
82 | PAN J, YOU Z H, LI L P, et al. DWPPI: a deep learning approach for predicting protein-protein interactions in plants based on multi-source information with a large-scale biological network[J]. Frontiers in Bioengineering and Biotechnology, 2022, 10: 807522. |
83 | KHANH LE N Q, KHA Q H. Prediction of protein-protein interactions through deep learning based on sequence feature extraction and interaction network[C/OL]//2022 IEEE Biomedical Circuits and Systems Conference (BioCAS). October 13-15, 2022, Taipei, China. IEEE, 2022: 539-543 [2023-10-01]. . |
84 | WANG J H, WANG X D, CHEN W T. Prediction of protein interactions based on CT-DNN[C/OL]//Proceedings of the 2022 9th International Conference on Biomedical and Bioinformatics Engineering. November 10-13, 2022, Kyoto, Japan. New York: ACM, 2022: 81-87 [2023-10-01]. . |
85 | MEWARA B, LALWANI S. Strengthening auto-feature engineering of deep learning architecture in protein-protein interaction prediction[C/OL]//SHARMA H, SHRIVASTAVA V, KUMARI BHARTI K, et al. Communication and intelligent systems: Proceedings of ICCIS 2021. Singapore: Springer, 2022: 1205-1216 [2023-10-01]. . |
86 | HASHEMIFAR S, NEYSHABUR B, KHAN A A, et al. Predicting protein-protein interactions through sequence-based deep learning[J]. Bioinformatics, 2018, 34(17): i802-i810. |
87 | WANG L, WANG H F, LIU S R, et al. Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation forest[J]. Scientific Reports, 2019, 9: 9848. |
88 | WANG Y B, YOU Z H, YANG S, et al. A high efficient biological language model for predicting protein-protein interactions[J]. Cells, 2019, 8(2): 122. |
89 | WARDAH W, DEHZANGI A, TAHERZADEH G, et al. Predicting protein-peptide binding sites with a deep convolutional neural network[J]. Journal of Theoretical Biology, 2020, 496: 110278. |
90 | ZENG M, ZHANG F H, WU F X, et al. Protein-protein interaction site prediction through combining local and global features with deep neural networks[J]. Bioinformatics, 2020, 36(4): 1114-1120. |
91 | LI F F, ZHU F, LING X H, et al. Protein interaction network reconstruction through ensemble deep learning with attention mechanism[J]. Frontiers in Bioengineering and Biotechnology, 2020, 8: 390. |
92 | XIE Z Y, DENG X Y, SHU K X. Prediction of protein-protein interaction sites using convolutional neural network and improved data sets[J]. International Journal of Molecular Sciences, 2020, 21(2): 467. |
93 | YANG X D, YANG S P, LIAN X Y, et al. Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction[J]. Bioinformatics, 2021, 37(24): 4771-4778. |
94 | WANG Y, LI Z C, ZHANG Y F, et al. Performance improvement for a 2D convolutional neural network by using SSC encoding on protein-protein interaction tasks[J]. BMC Bioinformatics, 2021, 22(1): 184. |
95 | SLEDZIESKI S, SINGH R, COWEN L, et al. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions[J]. Cell Systems, 2021, 12(10): 969-982.e6. |
96 | LIU-WEI W, KAFKAS Ş, CHEN J, et al. DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes[J]. Bioinformatics, 2021, 37(17): 2722-2729. |
97 | HU X T, FENG C, ZHOU Y C, et al. DeepTrio: a ternary prediction system for protein-protein interaction using mask multiple parallel convolutional neural networks[J]. Bioinformatics, 2022, 38(3): 694-702. |
98 | GAO H L, CHEN C, LI S Y, et al. Prediction of protein-protein interactions based on ensemble residual convolutional neural network[J]. Computers in Biology and Medicine, 2023, 152: 106471. |
99 | SOLEYMANI F, PAQUET E, VIKTOR H L, et al. ProtInteract: a deep learning framework for predicting protein-protein interactions[J]. Computational and Structural Biotechnology Journal, 2023, 21: 1324-1348. |
100 | GONZALEZ-LOPEZ F, MORALES-CORDOVILLA J A, VILLEGAS-MORCILLO A, et al. End-to-end prediction of protein-protein interaction based on embedding and recurrent neural networks[C/OL]//2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). December 3-6, 2018, Madrid, Spain. IEEE, 2019: 2344-2350 [2023-10-01]. . |
101 | ZHANG B Z, LI J Y, QUAN L J, et al. Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network[J]. Neurocomputing, 2019, 357: 86-100. |
102 | ALAKUS T B, TURKOGLU I. Prediction of protein-protein interactions with LSTM deep learning model[C/OL]//2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). October 11-13, 2019, Ankara, Turkey. IEEE, 2019: 1-5 [2023-10-01]. . |
103 | YANG L, HAN Y K, ZHANG H X, et al. Prediction of protein-protein interactions with local weight-sharing mechanism in deep learning[J]. BioMed Research International, 2020, 2020: 5072520. |
104 | XU H X, XU D, ZHANG N Q, et al. Protein-protein interaction prediction based on spectral radius and general regression neural network[J]. Journal of Proteome Research, 2021, 20(3): 1657-1665. |
105 | TSUKIYAMA S, HASAN M M, FUJII S, et al. LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with word2vec[J]. Briefings in Bioinformatics, 2021, 22(6): bbab228. |
106 | MEWARA B, LALWANI S. Sequence-based prediction of protein-protein interaction using auto-feature engineering of RNN-based model[J]. Research on Biomedical Engineering, 2023, 39(1): 259-272. |
107 | TANG M L, WU L X, YU X Y, et al. Prediction of protein-protein interaction sites based on stratified attentional mechanisms[J]. Frontiers in Genetics, 2021, 12: 784863. |
108 | TSUKIYAMA S, KURATA H. Cross-attention PHV: prediction of human and virus protein-protein interactions using cross-attention-based neural networks[J]. Computational and Structural Biotechnology Journal, 2022, 20: 5564-5573. |
109 | ASIM M N, IBRAHIM M ALI, MALIK M I, et al. ADH-PPI: an attention-based deep hybrid model for protein-protein interaction prediction[J]. iScience, 2022, 25(10): 105169. |
110 | LI X, HAN P F, WANG G, et al. SDNN-PPI: self-attention with deep neural network effect on protein-protein interaction prediction[J]. BMC Genomics, 2022, 23(1): 474. |
111 | MOU M J, PAN Z Q, ZHOU Z M, et al. A transformer-based ensemble framework for the prediction of protein-protein interaction sites[J]. Research, 2023, 6: 0240. |
112 | LI H, GONG X J, YU H A, et al. Deep neural network based predictions of protein interactions using primary sequences[J]. Molecules, 2018, 23(8): 1923. |
113 | CHEN M H, JU C J T, ZHOU G Y, et al. Multifaceted protein-protein interaction prediction based on Siamese residual RCNN[J]. Bioinformatics, 2019, 35(14): i305-i314. |
114 | GUO Y, CHEN X. A deep learning framework for improving protein interaction prediction using sequence properties[EB/OL]. bioRxiv, 2019: 843755[2023-10-01]. . |
115 | LI Y W, GOLDING G B, ILIE L. DELPHI: accurate deep ensemble model for protein interaction sites prediction[J]. Bioinformatics, 2021, 37(7): 896-904. |
116 | XU W X, GAO Y Y, WANG Y, et al. Protein-protein interaction prediction based on ordinal regression and recurrent convolutional neural networks[J]. BMC Bioinformatics, 2021, 22(): 485. |
117 | ALBU A I, BOCICOR M I, CZIBULA G. MM-StackEns: a new deep multimodal stacked generalization approach for protein-protein interaction prediction[J]. Computers in Biology and Medicine, 2023, 153: 106526. |
118 | DU X Q, SUN S W, HU C L, et al. DeepPPI: boosting prediction of protein-protein interactions with deep neural networks[J]. Journal of Chemical Information and Modeling, 2017, 57(6): 1499-1510. |
119 | YANG L, XIA J F, GUI J E. Prediction of protein-protein interactions from protein sequence using local descriptors[J]. Protein & Peptide Letters, 2010, 17(9): 1085-1090. |
120 | CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. arXiv, 2014: 1412.3555[2023-10-01]. . |
121 | NAMBIAR A, LIU S, HEFLIN M, et al. Transformer neural networks for protein family and interaction prediction tasks[J]. Journal of Computational Biology, 2023, 30(1): 95-111. |
122 | JAISWAL A, BABU A R, ZADEH M Z, et al. A survey on contrastive self-supervised learning[J]. Technologies, 2020, 9(1): 2. |
123 | HU X, CHU L Y, PEI J, et al. Model complexity of deep learning: a survey[J]. Knowledge and Information Systems, 2021, 63(10): 2585-2619. |
124 | OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C/OL]//Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 2022, 35: 27730-27744 [2023-10-01]. . |
125 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. arXiv, 2013: 1301.3781[2023-10-01]. . |
126 | ELNAGGAR A, HEINZINGER M, DALLAGO C, et al. ProtTrans: toward understanding the language of life through self-supervised learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 7112-7127. |
127 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
[1] | 雷茹, 陶慧, 刘天罡. 基因组深度挖掘驱动微生物萜类化合物高效发现[J]. 合成生物学, 2024, 5(3): 507-526. |
[2] | 奚萌宇, 胡逸灵, 顾玉诚, 戈惠明. 基因组挖掘指导天然药物分子的发现[J]. 合成生物学, 2024, 5(3): 447-473. |
[3] | 王晟, 王泽琛, 陈威华, 陈珂, 彭向达, 欧发芬, 郑良振, 孙瑨原, 沈涛, 赵国屏. 基于人工智能和计算生物学的合成生物学元件设计[J]. 合成生物学, 2023, 4(3): 422-443. |
[4] | 康里奇, 谈攀, 洪亮. 人工智能时代下的酶工程[J]. 合成生物学, 2023, 4(3): 524-534. |
[5] | 孟巧珍, 郭菲. “可折叠性”在酶智能设计改造中的应用研究——以AlphaFold2为例[J]. 合成生物学, 2023, 4(3): 571-589. |
[6] | 陈志航, 季梦麟, 戚逸飞. 人工智能蛋白质结构设计算法研究进展[J]. 合成生物学, 2023, 4(3): 464-487. |
[7] | 赖奇龙, 姚帅, 查毓国, 白虹, 宁康. 微生物组生物合成基因簇发掘方法及应用前景[J]. 合成生物学, 2023, 4(3): 611-627. |
[8] | 宋益东, 袁乾沐, 杨跃东. 深度学习在蛋白质功能预测中的应用[J]. 合成生物学, 2023, 4(3): 488-506. |
[9] | 吕靖伟, 邓子新, 张琪, 丁伟. 基于深度学习识别RiPPs前体肽及裂解位点[J]. 合成生物学, 2022, 3(6): 1262-1276. |
[10] | 卞佳豪, 杨广宇. 人工智能辅助的蛋白质工程[J]. 合成生物学, 2022, 3(3): 429-444. |
[11] | 于慧敏, 郑煜堃, 杜岩, 王苗苗, 梁有向. 合成生物学研究中的微生物启动子工程策略[J]. 合成生物学, 2021, 2(4): 598-611. |
[12] | 王也, 王昊晨, 晏明皓, 胡冠华, 汪小我. 生物分子序列的人工智能设计[J]. 合成生物学, 2021, 2(1): 1-14. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||