合成生物学 ›› 2023, Vol. 4 ›› Issue (3): 488-506.DOI: 10.12211/2096-8280.2022-078
宋益东, 袁乾沐, 杨跃东
收稿日期:
2022-12-31
修回日期:
2023-03-07
出版日期:
2023-06-30
发布日期:
2023-07-05
通讯作者:
杨跃东
作者简介:
基金资助:
Yidong SONG, Qianmu YUAN, Yuedong YANG
Received:
2022-12-31
Revised:
2023-03-07
Online:
2023-06-30
Published:
2023-07-05
Contact:
Yuedong YANG
摘要:
蛋白质功能预测是生物信息学中的一项重要任务,在疾病机制的阐明和药物靶点发现等领域有着重要作用。因为传统的测定蛋白质功能的生化实验通常成本高、耗时长、通量低,所以开发出高效且准确的蛋白质功能预测计算方法十分重要。蛋白质功能预测可以分为残基水平的结合位点预测和蛋白水平的基因本体论(gene ontology, GO)预测。本文首先介绍该领域常用的数据库及蛋白质特征信息,接着对当下最新的蛋白质功能预测方法进行总结。在结合位点预测方面,根据配体类型分别介绍了最新的蛋白质-蛋白质、蛋白质-多肽、蛋白质-核酸和蛋白质-小分子或离子配体的结合位点预测方法;在GO预测方面,按照预测方法的类别分别介绍了最近的基于序列、基于结构和基于蛋白相互作用网络的方法。最后,对目前的蛋白质功能预测方法进行总结、分析优劣,并展望该领域未来的发展方向。
中图分类号:
宋益东, 袁乾沐, 杨跃东. 深度学习在蛋白质功能预测中的应用[J]. 合成生物学, 2023, 4(3): 488-506.
Yidong SONG, Qianmu YUAN, Yuedong YANG. Application of deep learning in protein function prediction[J]. Synthetic Biology Journal, 2023, 4(3): 488-506.
名称 | 内容 | 下载 |
---|---|---|
PDB数据库 | 蛋白质结构 | https://www.rcsb.org/ |
BioLiP数据库 | 配体-蛋白质相互作用数据 | https://zhanggroup.org/BioLiP/ |
UniProt数据库 | 蛋白质序列和注释数据 | https://www.uniprot.org/ |
GO数据库 | 细胞组分,分子功能,生物学过程 | http://geneontology.org/ |
GOA数据库 | 基因本体注释数据 | https://www.ebi.ac.uk/GOA/ |
表1 常用数据库介绍
Table 1 Commonly used databases
名称 | 内容 | 下载 |
---|---|---|
PDB数据库 | 蛋白质结构 | https://www.rcsb.org/ |
BioLiP数据库 | 配体-蛋白质相互作用数据 | https://zhanggroup.org/BioLiP/ |
UniProt数据库 | 蛋白质序列和注释数据 | https://www.uniprot.org/ |
GO数据库 | 细胞组分,分子功能,生物学过程 | http://geneontology.org/ |
GOA数据库 | 基因本体注释数据 | https://www.ebi.ac.uk/GOA/ |
方法 | 数据来源 | 年份 | 特征① | 算法 | 是否 开源② | |
---|---|---|---|---|---|---|
蛋白-蛋白 | SPPIDER[ | PDB | 2007 | 物理化学性质,基于MSA的进化信息,DSSP结构信息,dSA (预测的和真实RSA的差值) | 全连接神经网络 | S |
SCRIBER[ | BioLip | 2019 | 相对溶剂可及性,进化保守性,相对氨基酸结合倾向性,物理化学性质,内部无序性,二级结构,残基位置 | 逻辑回归 | S | |
DELPHI | PDB, BioLip | 2020 | 高分值片段对,ProtVec1D,PSSM,进化保守性,相对溶剂可及性,相对氨基酸结合倾向性,亲水性,内部无序性,物理化学性质,PKx,位置信息 | CNN+GRU | S, C | |
DeepPPISP[ | PDB | 2020 | PSSM,二级结构,one-hot蛋白序列 | CNN | S, C | |
MaSIF[ | — | 2020 | 表面几何与物理化学特征,如局部曲率、Poisson-Boltzmann静电、氢键供体或受体以及亲水性 | 几何深度学习 | C | |
GraphPPIS | PDB | 2021 | PSSM,HMM,DSSP | GCN | S, C | |
蛋白-多肽 | SPRINT[ | PDB | 2016 | one-hot蛋白序列,PSSM,相对溶剂可及性,二级结构, 物理化学性质 | SVM | S |
PepBind[ | BioLiP | 2018 | PSSM,HMM,二级结构,内部无序性 | SVM+基于模板的方法 | S | |
Visual[ | BioLiP | 2020 | PSSM,半球暴露,二级结构,溶剂可及性,扭转角,物理 化学性质 | CNN | C | |
BioLip | 2021 | 体素化的11种原子密度 | 3D CNN | S, C | ||
PepNN | PDB | 2022 | 残基间距离,Cα的相对方向,局部坐标系间旋转矩阵,残基的相对位置,one-hot蛋白序列,扭转骨架角,语言模型特征 | 互注意力机制+GNN | C | |
蛋白-核酸 | DNAPred[ | PDB | 2019 | PSSM,预测的二级结构和溶剂可及性,结合与非结合氨基酸的频率差 | SVM | S |
NucBind[ | PDB | 2019 | PSSM,HMM,预测的二级结构,预测结构 | SVM+COACH-D[ | S | |
NCBRPred[ | — | 2021 | PSSM,HMM,预测的二级结构和溶剂可及性 | GRU | S, C | |
GraphBind | BioLiP | 2021 | 残基的原子特征,DSSP,PSSM,HMM | GNN | S, C | |
GraphSite | BioLiP | 2022 | AlphaFold2 single特征,PSSM,HMM,DSSP | Graph Transformer | S, C | |
蛋白-小分子或离子配体 | TargetS[ | PDB | 2013 | PSSM,预测的二级结构,相对氨基酸结合倾向性 | AdaBoost | S |
IonCom[ | BioLiP | 2016 | PSSM,预测的二级结构和溶剂可及性,保守性,氨基酸的离子结合频率,预测结构 | AdaBoost+SVM+COFACTOR[ | S, C | |
MIB[ | PDB | 2016 | 结构模板数据 | Fragment Transformation | S | |
DELIA | BioLip | 2020 | PSSM,HMM,二级结构,可溶性,S-SITE特征,基于结构的距离矩阵 | CNN | S | |
LMetalSite | BioLiP | 2022 | 语言模型特征 | Transformer+ 多任务学习 | S, C | |
综合不同类型配体 | MTDsite | BioLip | 2021 | PSSM,HMM,SPIDER3,溶剂可及性表面积,扭转角, 分界线内的残基数,半球暴露 | BiLSTM+ 多任务学习 | C |
DeepDISOBind | DisProt | 2022 | one-hot蛋白序列,相对氨基酸亲和性,二级结构,内部无序性 | CNN+多任务学习 | S, C |
表2 结合位点预测最新方法总结
Table 2 Summary of the latest binding site prediction methods
方法 | 数据来源 | 年份 | 特征① | 算法 | 是否 开源② | |
---|---|---|---|---|---|---|
蛋白-蛋白 | SPPIDER[ | PDB | 2007 | 物理化学性质,基于MSA的进化信息,DSSP结构信息,dSA (预测的和真实RSA的差值) | 全连接神经网络 | S |
SCRIBER[ | BioLip | 2019 | 相对溶剂可及性,进化保守性,相对氨基酸结合倾向性,物理化学性质,内部无序性,二级结构,残基位置 | 逻辑回归 | S | |
DELPHI | PDB, BioLip | 2020 | 高分值片段对,ProtVec1D,PSSM,进化保守性,相对溶剂可及性,相对氨基酸结合倾向性,亲水性,内部无序性,物理化学性质,PKx,位置信息 | CNN+GRU | S, C | |
DeepPPISP[ | PDB | 2020 | PSSM,二级结构,one-hot蛋白序列 | CNN | S, C | |
MaSIF[ | — | 2020 | 表面几何与物理化学特征,如局部曲率、Poisson-Boltzmann静电、氢键供体或受体以及亲水性 | 几何深度学习 | C | |
GraphPPIS | PDB | 2021 | PSSM,HMM,DSSP | GCN | S, C | |
蛋白-多肽 | SPRINT[ | PDB | 2016 | one-hot蛋白序列,PSSM,相对溶剂可及性,二级结构, 物理化学性质 | SVM | S |
PepBind[ | BioLiP | 2018 | PSSM,HMM,二级结构,内部无序性 | SVM+基于模板的方法 | S | |
Visual[ | BioLiP | 2020 | PSSM,半球暴露,二级结构,溶剂可及性,扭转角,物理 化学性质 | CNN | C | |
BioLip | 2021 | 体素化的11种原子密度 | 3D CNN | S, C | ||
PepNN | PDB | 2022 | 残基间距离,Cα的相对方向,局部坐标系间旋转矩阵,残基的相对位置,one-hot蛋白序列,扭转骨架角,语言模型特征 | 互注意力机制+GNN | C | |
蛋白-核酸 | DNAPred[ | PDB | 2019 | PSSM,预测的二级结构和溶剂可及性,结合与非结合氨基酸的频率差 | SVM | S |
NucBind[ | PDB | 2019 | PSSM,HMM,预测的二级结构,预测结构 | SVM+COACH-D[ | S | |
NCBRPred[ | — | 2021 | PSSM,HMM,预测的二级结构和溶剂可及性 | GRU | S, C | |
GraphBind | BioLiP | 2021 | 残基的原子特征,DSSP,PSSM,HMM | GNN | S, C | |
GraphSite | BioLiP | 2022 | AlphaFold2 single特征,PSSM,HMM,DSSP | Graph Transformer | S, C | |
蛋白-小分子或离子配体 | TargetS[ | PDB | 2013 | PSSM,预测的二级结构,相对氨基酸结合倾向性 | AdaBoost | S |
IonCom[ | BioLiP | 2016 | PSSM,预测的二级结构和溶剂可及性,保守性,氨基酸的离子结合频率,预测结构 | AdaBoost+SVM+COFACTOR[ | S, C | |
MIB[ | PDB | 2016 | 结构模板数据 | Fragment Transformation | S | |
DELIA | BioLip | 2020 | PSSM,HMM,二级结构,可溶性,S-SITE特征,基于结构的距离矩阵 | CNN | S | |
LMetalSite | BioLiP | 2022 | 语言模型特征 | Transformer+ 多任务学习 | S, C | |
综合不同类型配体 | MTDsite | BioLip | 2021 | PSSM,HMM,SPIDER3,溶剂可及性表面积,扭转角, 分界线内的残基数,半球暴露 | BiLSTM+ 多任务学习 | C |
DeepDISOBind | DisProt | 2022 | one-hot蛋白序列,相对氨基酸亲和性,二级结构,内部无序性 | CNN+多任务学习 | S, C |
方法 | 年份 | 特征 | 算法 | 是否开源① | |
---|---|---|---|---|---|
基于序列 | GOLabeler | 2018 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息, 结构域和基序,ProFET[ | LTR | S, C |
DeepGOPlus | 2020 | 基于序列和基序的功能信息 | CNN | S, C | |
TALE[ | 2021 | one-hot蛋白序列,GO层次结构矩阵、序列相似性 | Transformer+CNN | C | |
GAT-GO | 2022 | one-hot蛋白序列,PSSM,HMM,ESM-1b嵌入信息 | GAT | ||
DeeProtGO[ | 2022 | SeqVec序列嵌入、序列相似性、物种分类、InterPro蛋白结构域和蛋白家族信息、GO注释信息 | 层次化的全连接神经网络 | C | |
基于结构 | COFACTOR[ | 2017 | 蛋白序列、结构信息和PPI网络 | 序列比对+结构比对+ 基于网络邻居的功能聚合 | S |
DeepFRI | 2021 | 蛋白质接触图,语言模型特征 | GCN | S, C | |
基于网络 | DeepGO[ | 2018 | 蛋白序列,PPI网络 | CNN+层次化的全连接神经网络 | S, C |
NetGO | 2019 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息,结构域和基序,ProFET[ | LTR | S | |
NetGO 2.0 | 2021 | GO词频,基于序列信息, 蛋白质相互作用网络,序列中的深层模式,文献信息 | LTR | S | |
S2F | 2021 | 同源信息,HMMER特征,InterPro特征,进化信息,PPI网络 | label diffusion | S, C | |
DeepGraphGO | 2021 | InterPro特征,PPI网络 | GCN | C |
表3 最新GO预测类方法总结
Table 3 Summary of the latest GO prediction methods
方法 | 年份 | 特征 | 算法 | 是否开源① | |
---|---|---|---|---|---|
基于序列 | GOLabeler | 2018 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息, 结构域和基序,ProFET[ | LTR | S, C |
DeepGOPlus | 2020 | 基于序列和基序的功能信息 | CNN | S, C | |
TALE[ | 2021 | one-hot蛋白序列,GO层次结构矩阵、序列相似性 | Transformer+CNN | C | |
GAT-GO | 2022 | one-hot蛋白序列,PSSM,HMM,ESM-1b嵌入信息 | GAT | ||
DeeProtGO[ | 2022 | SeqVec序列嵌入、序列相似性、物种分类、InterPro蛋白结构域和蛋白家族信息、GO注释信息 | 层次化的全连接神经网络 | C | |
基于结构 | COFACTOR[ | 2017 | 蛋白序列、结构信息和PPI网络 | 序列比对+结构比对+ 基于网络邻居的功能聚合 | S |
DeepFRI | 2021 | 蛋白质接触图,语言模型特征 | GCN | S, C | |
基于网络 | DeepGO[ | 2018 | 蛋白序列,PPI网络 | CNN+层次化的全连接神经网络 | S, C |
NetGO | 2019 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息,结构域和基序,ProFET[ | LTR | S | |
NetGO 2.0 | 2021 | GO词频,基于序列信息, 蛋白质相互作用网络,序列中的深层模式,文献信息 | LTR | S | |
S2F | 2021 | 同源信息,HMMER特征,InterPro特征,进化信息,PPI网络 | label diffusion | S, C | |
DeepGraphGO | 2021 | InterPro特征,PPI网络 | GCN | C |
1 | EISENBERG D, MARCOTTE E M, XENARIOS I, et al. Protein function in the post-genomic era[J]. Nature, 2000, 405(6788): 823-826. |
2 | RADIVOJAC P, CLARK W T, ORON T R, et al. A large-scale evaluation of computational protein function prediction[J]. Nature Methods, 2013, 10(3): 221-227. |
3 | ISRALEWITZ B, BAUDRY J, GULLINGSRUD J, et al. Steered molecular dynamics investigations of protein function[J]. Journal of Molecular Graphics & Modelling, 2001, 19(1): 13-25. |
4 | KLEPEIS J L, LINDORFF-LARSEN K, DROR R O, et al. Long-timescale molecular dynamics simulations of protein structure and function[J]. Current Opinion in Structural Biology, 2009, 19(2): 120-127. |
5 | PIERRI C L, PARISI G, PORCELLI V. Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening[J]. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 2010, 1804(9): 1695-1712. |
6 | YUAN Q M, CHEN S, RAO J H, et al. AlphaFold2-aware protein-DNA binding site prediction using graph transformer[J]. Briefings in Bioinformatics, 2022, 23(2): bbab564. |
7 | XIA Y, XIA C Q, PAN X Y, et al. GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues[J]. Nucleic Acids Research, 2021, 49(9): e51. |
8 | YUAN Q M, CHEN J W, ZHAO H Y, et al. Structure-aware protein-protein interaction site prediction using deep graph convolutional network[J]. Bioinformatics, 2021, 38(1): 125-132. |
9 | KULMANOV M, HOEHNDORF R. DeepGOPlus: improved protein function prediction from sequence[J]. Bioinformatics, 2020, 36(2): 422-429. |
10 | ZHANG J, KURGAN L. Review and comparative assessment of sequence-based predictors of protein-binding residues[J]. Briefings in Bioinformatics, 2018, 19(5): 821-837. |
11 | KUZMANOV U, EMILI A. Protein-protein interaction networks: probing disease mechanisms using model systems[J]. Genome Medicine, 2013, 5(4): 37. |
12 | WELLS J A, MCCLENDON C L. Reaching for high-hanging fruit in drug discovery at protein-protein interfaces[J]. Nature, 2007, 450(7172): 1001-1009. |
13 | LI Y W, GOLDING G B, ILIE L. DELPHI: accurate deep ensemble model for protein interaction sites prediction[J]. Bioinformatics, 2021, 37(7): 896-904. |
14 | ABDIN O, NIM S, WEN H, et al. PepNN: a deep attention model for the identification of peptide binding sites[J]. Communications Biology, 2022, 5: 503. |
15 | CHEN J W, XIE Z R, WU Y H. Understand protein functions by comparing the similarity of local structural environments[J]. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 2017, 1865(2): 142-152. |
16 | LIN Y F, CHENG C W, SHIH C S, et al. MIB: metal ion-binding site prediction and docking server[J]. Journal of Chemical Information and Modeling, 2016, 56(12): 2287-2291. |
17 | XIA C Q, PAN X Y, SHEN H B. Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data[J]. Bioinformatics, 2020, 36(10): 3018-3027. |
18 | YANG J Y, ROY A, ZHANG Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment[J]. Bioinformatics, 2013, 29(20): 2588-2595. |
19 | HU X Z, DONG Q W, YANG J Y, et al. Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals[J]. Bioinformatics, 2016, 32(21): 3260-3269. |
20 | ASHBURNER M, BALL C A, BLAKE J A, et al. Gene ontology: tool for the unification of biology[J]. Nature Genetics, 2000, 25(1): 25-29. |
21 | DAVIS J, GOADRICH M. The relationship between Precision-Recall and ROC curves[C]//Proceedings of the 23rd international conference on Machine learning. June 25-29, 2006, Pittsburgh, Pennsylvania, USA. New York: ACM, 2006: 233-240. |
22 | CONESA A, GÖTZ S, GARCÍA-GÓMEZ J M, et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research[J]. Bioinformatics, 2005, 21(18): 3674-3676. |
23 | YOU R H, ZHANG Z H, XIONG Y, et al. GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank[J]. Bioinformatics, 2018, 34(14): 2465-2473. |
24 | LI H. A short introduction to learning to rank[J]. IEICE Transactions on Information and Systems, 2011, E94-D(10): 1854-1862. |
25 | CAO Y, SHEN Y. TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding[J]. Bioinformatics, 2021, 37(18): 2825-2833. |
26 | GLIGORIJEVIĆ V, DOUGLAS RENFREW P, KOSCIOLEK T, et al. Structure-based protein function prediction using graph convolutional networks[J]. Nature Communications, 2021, 12: 3168. |
27 | OLIVER S. Guilt-by-association goes global[J]. Nature, 2000, 403(6770): 601-602. |
28 | YOU R H, YAO S W, XIONG Y, et al. NetGO: improving large-scale protein function prediction with massive network information[J]. Nucleic Acids Research, 2019, 47(W1): W379-W387. |
29 | SZKLARCZYK D, GABLE A L, NASTOU K C, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets[J]. Nucleic Acids Research, 2021, 49(D1): D605-D612. |
30 | YAO S W, YOU R H, WANG S J, et al. NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information[J]. Nucleic Acids Research, 2021, 49(W1): W469-W475. |
31 | WANG S Y, LIANG K, HU Q S, et al. JAK2-binding long noncoding RNA promotes breast cancer brain metastasis[J]. The Journal of Clinical Investigation, 2017, 127(12): 4498-4515. |
32 | TIRALONGO J, COOPER O, LITFIN T, et al. YesU from Bacillus subtilis preferentially binds fucosylated glycans[J]. Scientific Reports, 2018, 8: 13139. |
33 | KUMAR R, CORBETT M A, VAN BON B W M, et al. THOC2 mutations implicate mRNA-export pathway in X-linked intellectual disability[J]. The American Journal of Human Genetics, 2015, 97(2): 302-310. |
34 | SCHMIDTKE P, BARRIL X. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites[J]. Journal of Medicinal Chemistry, 2010, 53(15): 5858-5867. |
35 | XU M Y, RAN T, CHEN H M. De novo molecule design through the molecular generative model conditioned by 3D information of protein binding sites[J]. Journal of Chemical Information and Modeling, 2021, 61(7): 3240-3254. |
36 | HEFFERNAN R, YANG Y D, PALIWAL K, et al. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility[J]. Bioinformatics, 2017, 33(18): 2842-2849. |
37 | ALTSCHUL S F, MADDEN T L, SCHÄFFER A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402. |
38 | SUZEK B E, HUANG H Z, MCGARVEY P, et al. UniRef: comprehensive and non-redundant UniProt reference clusters[J]. Bioinformatics, 2007, 23(10): 1282-1288. |
39 | REMMERT M, BIEGERT A, HAUSER A, et al. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment[J]. Nature Methods, 2012, 9(2): 173-175. |
40 | MIRDITA M, VON DEN DRIESCH L, GALIEZ C, et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments[J]. Nucleic Acids Research, 2017, 45(D1): D170-D176. |
41 | MEILER J, MÜLLER M, ZEIDLER A, et al. Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks[J]. Molecular Modeling Annual, 2001, 7(9): 360-369. |
42 | RIVES A, MEIER J, SERCU T, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J]. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118. |
43 | ELNAGGAR A, HEINZINGER M, DALLAGO C, et al. ProtTrans: towards cracking the language of life's code through self-supervised deep learning and high performance computing [EB/OL]. arXiv, 2020: 2007.06225[2023-02-01]. . |
44 | KABSCH W, SANDER C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features[J]. Biopolymers, 1983, 22(12): 2577-2637. |
45 | POROLLO A, MELLER J. Prediction-based fingerprints of protein-protein interactions[J]. Proteins: Structure, Function, and Bioinformatics, 2007, 66(3): 630-645. |
46 | ZHANG J, KURGAN L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences[J]. Bioinformatics, 2019, 35(14): i343-i353. |
47 | ZENG M, ZHANG F H, WU F X, et al. Protein-protein interaction site prediction through combining local and global features with deep neural networks[J]. Bioinformatics, 2020, 36(4): 1114-1120. |
48 | GAINZA P, SVERRISSON F, MONTI F, et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning[J]. Nature Methods, 2020, 17(2): 184-192. |
49 | TAHERZADEH G, YANG Y D, ZHANG T, et al. Sequence-based prediction of protein-peptide binding sites using support vector machine[J]. Journal of Computational Chemistry, 2016, 37(13): 1223-1229. |
50 | ZHAO Z J, PENG Z L, YANG J Y. Improving sequence-based prediction of protein-peptide binding residues by introducing intrinsic disorder and a consensus method[J]. Journal of Chemical Information and Modeling, 2018, 58(7): 1459-1468. |
51 | WARDAH W, DEHZANGI A, TAHERZADEH G, et al. Predicting protein-peptide binding sites with a deep convolutional neural network[J]. Journal of Theoretical Biology, 2020, 496: 110278. |
52 | ZHU Y H, HU J, SONG X N, et al. DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines[J]. Journal of Chemical Information and Modeling, 2019, 59(6): 3057-3071. |
53 | SU H, LIU M C, SUN S S, et al. Improving the prediction of protein-nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods[J]. Bioinformatics, 2019, 35(6): 930-936. |
54 | WU Q, PENG Z L, ZHANG Y, et al. COACH-D: improved protein-ligand binding sites prediction with refined ligand-binding poses through molecular docking[J]. Nucleic Acids Research, 2018, 46(W1): W438-W442. |
55 | ZHANG J, CHEN Q C, LIU B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning[J]. Briefings in Bioinformatics, 2021, 22(5): bbaa397. |
56 | YU D J, HU J, YANG J, et al. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2013, 10(4): 994-1008. |
57 | ROY A, YANG J Y, ZHANG Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation[J]. Nucleic Acids Research, 2012, 40(W1): W471-W477. |
58 | OFER D, LINIAL M. ProFET: feature engineering captures high-level protein functions[J]. Bioinformatics, 2015, 31(21): 3429-3436. |
59 | KOZLOVSKII I, POPOV P. Protein-peptide binding site detection using 3D convolutional neural networks[J]. Journal of Chemical Information and Modeling, 2021, 61(8): 3814-3823. |
60 | CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation [EB/OL]. arXiv, 2014: 1406.1078[2023-02-01]. . |
61 | GRAVES A. Long short-term memory[M]//Studies in Computational Intelligence: Supervised sequence labelling with recurrent neural networks. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012: 37-45. |
62 | LECUN Y, BENGIO Y. Convolutional networks for images, speech, and time series[M/OL]//The handbook of brain theory and neural networks. Cambridge, MA, USA: MIT Press, 1995[2023-02-01]. . |
63 | YUAN Q M, CHEN S, WANG Y, et al. Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning[J]. Briefings in Bioinformatics, 2022, 23(6): bbac444. |
64 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need. Advances in neural information processing systems[C/OL]//Advances in Neural Information Processing Systems 30-NeurIPS 2017[2023-02-01]. . |
65 | ZHENG S J, RAO J H, ZHANG Z Y, et al. Predicting retrosynthetic reactions using self-corrected transformer neural networks[J]. Journal of Chemical Information and Modeling, 2020, 60(1): 47-55. |
66 | FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. August 6-11, 2017, Sydney, NSW, Australia. New York: ACM, 2017: 1126-1135. |
67 | WANG J H, ZHENG S J, CHEN J W, et al. Meta learning for low-resource molecular optimization[J]. Journal of Chemical Information and Modeling, 2021, 61(4): 1627-1636. |
68 | SUN Z, ZHENG S J, ZHAO H Y, et al. To improve prediction of binding residues with DNA, RNA, carbohydrate, and peptide via multi-task deep neural networks[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 19(6): 3735-3743. |
69 | ZHANG F H, ZHAO B, SHI W B, et al. DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning[J]. Briefings in Bioinformatics, 2022, 23(1): bbab521. |
70 | ZHANG Y, YANG Q. An overview of multi-task learning[J]. National Science Review, 2018, 5(1): 30-43. |
71 | CARUANA R. Multitask learning[J].Machine Learning, 1997, 28(1): 41-75. |
72 | MERINO G A, SAIDI R, MILONE D H, et al. Hierarchical deep learning for predicting GO annotations by integrating protein knowledge[J]. Bioinformatics, 2022, 38(19): 4488-4496. |
73 | ZHANG C X, FREDDOLINO P L, ZHANG Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information[J]. Nucleic Acids Research, 2017, 45(W1): W291-W299. |
74 | KULMANOV M, KHAN M A, HOEHNDORF R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier[J]. Bioinformatics, 2018, 34(4): 660-668. |
75 | LAI B Q, XU J B. Accurate protein function prediction via graph attention networks with predicted structure information[J]. Briefings in Bioinformatics, 2022, 23(1): bbab502. |
76 | XU J B, MCPARTLON M, LI J. Improved protein structure prediction by deep learning irrespective of co-evolution information[J]. Nature Machine Intelligence, 2021, 3(7): 601-609. |
77 | ALTSCHUL S F, GISH W, MILLER W, et al. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215(3): 403-410. |
78 | VILLEGAS-MORCILLO A, MAKRODIMITRIS S, VAN HAM R C H J, et al. Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function[J]. Bioinformatics, 2021, 37(2): 162-170. |
79 | VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. arXiv, 2017[2023-02-01]. . |
80 | LEE J Y, LEE I Y, KANG J W. Self-attention graph pooling[C/OL]//Proceedings of the 22nd international conference on Machine learning, 9-15 June 2019, Long Beach, California, USA, 97:3734-3743 [2023-02-01]. . |
81 | BOUTET E, LIEBERHERR D, TOGNOLLI M, et al. UniProtKB/Swiss-prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view[M]//Plant Bioinformatics. New York: Springer New York, 2016: 23-54. |
82 | TORRES M, YANG H X, ROMERO A E, et al. Protein function prediction for newly sequenced organisms[J]. Nature Machine Intelligence, 2021, 3(12): 1050-1060. |
83 | MOSTAFAVI S, RAY D, WARDE-FARLEY D, et al. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function[J]. Genome Biology, 2008, 9(): S4. |
84 | YOU R H, YAO S W, MAMITSUKA H, et al. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction[J]. Bioinformatics, 2021, 37(): i262-i271. |
85 | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. arXiv, 2016: 1609.02907[2023-02-01]. . |
86 | MITCHELL A L, ATTWOOD T K, BABBITT P C, et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations[J]. Nucleic Acids Research, 2019, 47(D1): D351-D360. |
87 | FINN R D, COGGILL P, EBERHARDT R Y, et al. The Pfam protein families database: towards a more sustainable future[J]. Nucleic Acids Research, 2016, 44(D1): D279-D285. |
88 | OATES M E, STAHLHACKE J, VAVOULIS D V, et al. The SUPERFAMILY 1.75 database in 2014: a doubling of data[J]. Nucleic Acids Research, 2015, 43(D1): D227-D233. |
89 | LEWIS T E, SILLITOE I, DAWSON N, et al. Gene3D: extensive prediction of globular domains in proteins[J]. Nucleic Acids Research, 2018, 46(D1): D1282. |
90 | MARCHLER-BAUER A, BO Y, HAN L Y, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures[J]. Nucleic Acids Research, 2017, 45(D1): D200-D203. |
91 | ZHOU J, CUI G Q, HU S D, et al. Graph neural networks: a review of methods and applications[J]. AI Open, 2020, 1: 57-81. |
92 | LIN Z M, AKIN H, RAO R S, et al., Language models of protein sequences at the scale of evolution enable accurate structure prediction [EB/OL]. bioRxiv, 2022[2023-02-01].. |
93 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
94 | JING B W, EISMANN S, SURIANA P, et al. Learning from protein structure with geometric vector perceptrons[EB/OL]. arXiv, 2020: 2009.01411[2023-02-01]. . |
95 | YUN S J, JEONG M Y, KIM R Y, et al. Graph transformer networks[C/OL]//Advances in Neural Information Processing Systems 32-NeurIPS 2019[2023-02-01]. . |
96 | CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020: 1597-1607. |
97 | ZHU Y H, ZHANG C X, YU D J, et al. Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction[J]. PLoS Computational Biology, 2022, 18(12): e1010793. |
98 | ZHENG S J, RAO J H, SONG Y, et al. PharmKG: a dedicated knowledge graph benchmark for bomedical data mining[J]. Briefings in Bioinformatics, 2021, 22(4): bbaa344. |
[1] | 吴玉洁, 刘欣欣, 刘健慧, 杨开广, 随志刚, 张丽华, 张玉奎. 基于高通量液相色谱质谱技术的菌株筛选与关键分子定量分析研究进展[J]. 合成生物学, 2023, 4(5): 1000-1019. |
[2] | 黄鹤, 吴桐, 王闻达, 李佳珊, 孙黛雯, 叶启威, 龚新奇. 蛋白质复合物结构预测:方法与进展[J]. 合成生物学, 2023, 4(3): 507-523. |
[3] | 陈志航, 季梦麟, 戚逸飞. 人工智能蛋白质结构设计算法研究进展[J]. 合成生物学, 2023, 4(3): 464-487. |
[4] | 唐一鸣, 姚逸飞, 杨中元, 周运, 王子超, 韦广红. 神经退行性疾病相关蛋白病理性聚集和液液相分离研究进展[J]. 合成生物学, 2023, 4(3): 590-610. |
[5] | 孟巧珍, 郭菲. “可折叠性”在酶智能设计改造中的应用研究——以AlphaFold2为例[J]. 合成生物学, 2023, 4(3): 571-589. |
[6] | 康里奇, 谈攀, 洪亮. 人工智能时代下的酶工程[J]. 合成生物学, 2023, 4(3): 524-534. |
[7] | 王晟, 王泽琛, 陈威华, 陈珂, 彭向达, 欧发芬, 郑良振, 孙瑨原, 沈涛, 赵国屏. 基于人工智能和计算生物学的合成生物学元件设计[J]. 合成生物学, 2023, 4(3): 422-443. |
[8] | 阮青云, 黄莘, 孟子钧, 全舒. 蛋白质稳定性计算设计与定向进化前沿工具[J]. 合成生物学, 2023, 4(1): 5-29. |
[9] | 梁丽亚, 刘嵘明. 靶向DNA的Ⅱ类CRISPR/Cas系统的蛋白工程化改造[J]. 合成生物学, 2023, 4(1): 86-101. |
[10] | 祁延萍, 朱晋, 张凯, 刘彤, 王雅婕. 定向进化在蛋白质工程中的应用研究进展[J]. 合成生物学, 2022, 3(6): 1081-1108. |
[11] | 吕靖伟, 邓子新, 张琪, 丁伟. 基于深度学习识别RiPPs前体肽及裂解位点[J]. 合成生物学, 2022, 3(6): 1262-1276. |
[12] | 易琪昆, 孙晨博, 杨中光, 王日, 寇松姿, 李朝霞, 孙飞. 可基因编码点击化学在材料合成生物学中的应用[J]. 合成生物学, 2022, 3(4): 690-708. |
[13] | 涂涛, 罗会颖, 姚斌. 蛋白质工程在饲料用酶研发中的应用研究进展[J]. 合成生物学, 2022, 3(3): 487-499. |
[14] | 唐宇琦, 叶松涛, 刘嘉, 张鑫. 分子伴侣作用下的蛋白质稳定与进化[J]. 合成生物学, 2022, 3(3): 445-464. |
[15] | 王汇滨, 车昌丽, 游松. Fe/α-酮戊二酸依赖型卤化酶在绿色卤化反应中的研究进展[J]. 合成生物学, 2022, 3(3): 545-566. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||