Synthetic Biology Journal ›› 2023, Vol. 4 ›› Issue (3): 488-506.DOI: 10.12211/2096-8280.2022-078
• Invited Review • Previous Articles Next Articles
Yidong SONG, Qianmu YUAN, Yuedong YANG
Received:
2022-12-31
Revised:
2023-03-07
Online:
2023-07-05
Published:
2023-06-30
Contact:
Yuedong YANG
宋益东, 袁乾沐, 杨跃东
通讯作者:
杨跃东
作者简介:
基金资助:
CLC Number:
Yidong SONG, Qianmu YUAN, Yuedong YANG. Application of deep learning in protein function prediction[J]. Synthetic Biology Journal, 2023, 4(3): 488-506.
宋益东, 袁乾沐, 杨跃东. 深度学习在蛋白质功能预测中的应用[J]. 合成生物学, 2023, 4(3): 488-506.
Add to citation manager EndNote|Ris|BibTeX
URL: https://synbioj.cip.com.cn/EN/10.12211/2096-8280.2022-078
名称 | 内容 | 下载 |
---|---|---|
PDB数据库 | 蛋白质结构 | https://www.rcsb.org/ |
BioLiP数据库 | 配体-蛋白质相互作用数据 | https://zhanggroup.org/BioLiP/ |
UniProt数据库 | 蛋白质序列和注释数据 | https://www.uniprot.org/ |
GO数据库 | 细胞组分,分子功能,生物学过程 | http://geneontology.org/ |
GOA数据库 | 基因本体注释数据 | https://www.ebi.ac.uk/GOA/ |
Table 1 Commonly used databases
名称 | 内容 | 下载 |
---|---|---|
PDB数据库 | 蛋白质结构 | https://www.rcsb.org/ |
BioLiP数据库 | 配体-蛋白质相互作用数据 | https://zhanggroup.org/BioLiP/ |
UniProt数据库 | 蛋白质序列和注释数据 | https://www.uniprot.org/ |
GO数据库 | 细胞组分,分子功能,生物学过程 | http://geneontology.org/ |
GOA数据库 | 基因本体注释数据 | https://www.ebi.ac.uk/GOA/ |
方法 | 数据来源 | 年份 | 特征① | 算法 | 是否 开源② | |
---|---|---|---|---|---|---|
蛋白-蛋白 | SPPIDER[ | PDB | 2007 | 物理化学性质,基于MSA的进化信息,DSSP结构信息,dSA (预测的和真实RSA的差值) | 全连接神经网络 | S |
SCRIBER[ | BioLip | 2019 | 相对溶剂可及性,进化保守性,相对氨基酸结合倾向性,物理化学性质,内部无序性,二级结构,残基位置 | 逻辑回归 | S | |
DELPHI | PDB, BioLip | 2020 | 高分值片段对,ProtVec1D,PSSM,进化保守性,相对溶剂可及性,相对氨基酸结合倾向性,亲水性,内部无序性,物理化学性质,PKx,位置信息 | CNN+GRU | S, C | |
DeepPPISP[ | PDB | 2020 | PSSM,二级结构,one-hot蛋白序列 | CNN | S, C | |
MaSIF[ | — | 2020 | 表面几何与物理化学特征,如局部曲率、Poisson-Boltzmann静电、氢键供体或受体以及亲水性 | 几何深度学习 | C | |
GraphPPIS | PDB | 2021 | PSSM,HMM,DSSP | GCN | S, C | |
蛋白-多肽 | SPRINT[ | PDB | 2016 | one-hot蛋白序列,PSSM,相对溶剂可及性,二级结构, 物理化学性质 | SVM | S |
PepBind[ | BioLiP | 2018 | PSSM,HMM,二级结构,内部无序性 | SVM+基于模板的方法 | S | |
Visual[ | BioLiP | 2020 | PSSM,半球暴露,二级结构,溶剂可及性,扭转角,物理 化学性质 | CNN | C | |
BioLip | 2021 | 体素化的11种原子密度 | 3D CNN | S, C | ||
PepNN | PDB | 2022 | 残基间距离,Cα的相对方向,局部坐标系间旋转矩阵,残基的相对位置,one-hot蛋白序列,扭转骨架角,语言模型特征 | 互注意力机制+GNN | C | |
蛋白-核酸 | DNAPred[ | PDB | 2019 | PSSM,预测的二级结构和溶剂可及性,结合与非结合氨基酸的频率差 | SVM | S |
NucBind[ | PDB | 2019 | PSSM,HMM,预测的二级结构,预测结构 | SVM+COACH-D[ | S | |
NCBRPred[ | — | 2021 | PSSM,HMM,预测的二级结构和溶剂可及性 | GRU | S, C | |
GraphBind | BioLiP | 2021 | 残基的原子特征,DSSP,PSSM,HMM | GNN | S, C | |
GraphSite | BioLiP | 2022 | AlphaFold2 single特征,PSSM,HMM,DSSP | Graph Transformer | S, C | |
蛋白-小分子或离子配体 | TargetS[ | PDB | 2013 | PSSM,预测的二级结构,相对氨基酸结合倾向性 | AdaBoost | S |
IonCom[ | BioLiP | 2016 | PSSM,预测的二级结构和溶剂可及性,保守性,氨基酸的离子结合频率,预测结构 | AdaBoost+SVM+COFACTOR[ | S, C | |
MIB[ | PDB | 2016 | 结构模板数据 | Fragment Transformation | S | |
DELIA | BioLip | 2020 | PSSM,HMM,二级结构,可溶性,S-SITE特征,基于结构的距离矩阵 | CNN | S | |
LMetalSite | BioLiP | 2022 | 语言模型特征 | Transformer+ 多任务学习 | S, C | |
综合不同类型配体 | MTDsite | BioLip | 2021 | PSSM,HMM,SPIDER3,溶剂可及性表面积,扭转角, 分界线内的残基数,半球暴露 | BiLSTM+ 多任务学习 | C |
DeepDISOBind | DisProt | 2022 | one-hot蛋白序列,相对氨基酸亲和性,二级结构,内部无序性 | CNN+多任务学习 | S, C |
Table 2 Summary of the latest binding site prediction methods
方法 | 数据来源 | 年份 | 特征① | 算法 | 是否 开源② | |
---|---|---|---|---|---|---|
蛋白-蛋白 | SPPIDER[ | PDB | 2007 | 物理化学性质,基于MSA的进化信息,DSSP结构信息,dSA (预测的和真实RSA的差值) | 全连接神经网络 | S |
SCRIBER[ | BioLip | 2019 | 相对溶剂可及性,进化保守性,相对氨基酸结合倾向性,物理化学性质,内部无序性,二级结构,残基位置 | 逻辑回归 | S | |
DELPHI | PDB, BioLip | 2020 | 高分值片段对,ProtVec1D,PSSM,进化保守性,相对溶剂可及性,相对氨基酸结合倾向性,亲水性,内部无序性,物理化学性质,PKx,位置信息 | CNN+GRU | S, C | |
DeepPPISP[ | PDB | 2020 | PSSM,二级结构,one-hot蛋白序列 | CNN | S, C | |
MaSIF[ | — | 2020 | 表面几何与物理化学特征,如局部曲率、Poisson-Boltzmann静电、氢键供体或受体以及亲水性 | 几何深度学习 | C | |
GraphPPIS | PDB | 2021 | PSSM,HMM,DSSP | GCN | S, C | |
蛋白-多肽 | SPRINT[ | PDB | 2016 | one-hot蛋白序列,PSSM,相对溶剂可及性,二级结构, 物理化学性质 | SVM | S |
PepBind[ | BioLiP | 2018 | PSSM,HMM,二级结构,内部无序性 | SVM+基于模板的方法 | S | |
Visual[ | BioLiP | 2020 | PSSM,半球暴露,二级结构,溶剂可及性,扭转角,物理 化学性质 | CNN | C | |
BioLip | 2021 | 体素化的11种原子密度 | 3D CNN | S, C | ||
PepNN | PDB | 2022 | 残基间距离,Cα的相对方向,局部坐标系间旋转矩阵,残基的相对位置,one-hot蛋白序列,扭转骨架角,语言模型特征 | 互注意力机制+GNN | C | |
蛋白-核酸 | DNAPred[ | PDB | 2019 | PSSM,预测的二级结构和溶剂可及性,结合与非结合氨基酸的频率差 | SVM | S |
NucBind[ | PDB | 2019 | PSSM,HMM,预测的二级结构,预测结构 | SVM+COACH-D[ | S | |
NCBRPred[ | — | 2021 | PSSM,HMM,预测的二级结构和溶剂可及性 | GRU | S, C | |
GraphBind | BioLiP | 2021 | 残基的原子特征,DSSP,PSSM,HMM | GNN | S, C | |
GraphSite | BioLiP | 2022 | AlphaFold2 single特征,PSSM,HMM,DSSP | Graph Transformer | S, C | |
蛋白-小分子或离子配体 | TargetS[ | PDB | 2013 | PSSM,预测的二级结构,相对氨基酸结合倾向性 | AdaBoost | S |
IonCom[ | BioLiP | 2016 | PSSM,预测的二级结构和溶剂可及性,保守性,氨基酸的离子结合频率,预测结构 | AdaBoost+SVM+COFACTOR[ | S, C | |
MIB[ | PDB | 2016 | 结构模板数据 | Fragment Transformation | S | |
DELIA | BioLip | 2020 | PSSM,HMM,二级结构,可溶性,S-SITE特征,基于结构的距离矩阵 | CNN | S | |
LMetalSite | BioLiP | 2022 | 语言模型特征 | Transformer+ 多任务学习 | S, C | |
综合不同类型配体 | MTDsite | BioLip | 2021 | PSSM,HMM,SPIDER3,溶剂可及性表面积,扭转角, 分界线内的残基数,半球暴露 | BiLSTM+ 多任务学习 | C |
DeepDISOBind | DisProt | 2022 | one-hot蛋白序列,相对氨基酸亲和性,二级结构,内部无序性 | CNN+多任务学习 | S, C |
方法 | 年份 | 特征 | 算法 | 是否开源① | |
---|---|---|---|---|---|
基于序列 | GOLabeler | 2018 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息, 结构域和基序,ProFET[ | LTR | S, C |
DeepGOPlus | 2020 | 基于序列和基序的功能信息 | CNN | S, C | |
TALE[ | 2021 | one-hot蛋白序列,GO层次结构矩阵、序列相似性 | Transformer+CNN | C | |
GAT-GO | 2022 | one-hot蛋白序列,PSSM,HMM,ESM-1b嵌入信息 | GAT | ||
DeeProtGO[ | 2022 | SeqVec序列嵌入、序列相似性、物种分类、InterPro蛋白结构域和蛋白家族信息、GO注释信息 | 层次化的全连接神经网络 | C | |
基于结构 | COFACTOR[ | 2017 | 蛋白序列、结构信息和PPI网络 | 序列比对+结构比对+ 基于网络邻居的功能聚合 | S |
DeepFRI | 2021 | 蛋白质接触图,语言模型特征 | GCN | S, C | |
基于网络 | DeepGO[ | 2018 | 蛋白序列,PPI网络 | CNN+层次化的全连接神经网络 | S, C |
NetGO | 2019 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息,结构域和基序,ProFET[ | LTR | S | |
NetGO 2.0 | 2021 | GO词频,基于序列信息, 蛋白质相互作用网络,序列中的深层模式,文献信息 | LTR | S | |
S2F | 2021 | 同源信息,HMMER特征,InterPro特征,进化信息,PPI网络 | label diffusion | S, C | |
DeepGraphGO | 2021 | InterPro特征,PPI网络 | GCN | C |
Table 3 Summary of the latest GO prediction methods
方法 | 年份 | 特征 | 算法 | 是否开源① | |
---|---|---|---|---|---|
基于序列 | GOLabeler | 2018 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息, 结构域和基序,ProFET[ | LTR | S, C |
DeepGOPlus | 2020 | 基于序列和基序的功能信息 | CNN | S, C | |
TALE[ | 2021 | one-hot蛋白序列,GO层次结构矩阵、序列相似性 | Transformer+CNN | C | |
GAT-GO | 2022 | one-hot蛋白序列,PSSM,HMM,ESM-1b嵌入信息 | GAT | ||
DeeProtGO[ | 2022 | SeqVec序列嵌入、序列相似性、物种分类、InterPro蛋白结构域和蛋白家族信息、GO注释信息 | 层次化的全连接神经网络 | C | |
基于结构 | COFACTOR[ | 2017 | 蛋白序列、结构信息和PPI网络 | 序列比对+结构比对+ 基于网络邻居的功能聚合 | S |
DeepFRI | 2021 | 蛋白质接触图,语言模型特征 | GCN | S, C | |
基于网络 | DeepGO[ | 2018 | 蛋白序列,PPI网络 | CNN+层次化的全连接神经网络 | S, C |
NetGO | 2019 | GO词频,序列对比信息,氨基酸三联体(3 mer),蛋白家族信息,结构域和基序,ProFET[ | LTR | S | |
NetGO 2.0 | 2021 | GO词频,基于序列信息, 蛋白质相互作用网络,序列中的深层模式,文献信息 | LTR | S | |
S2F | 2021 | 同源信息,HMMER特征,InterPro特征,进化信息,PPI网络 | label diffusion | S, C | |
DeepGraphGO | 2021 | InterPro特征,PPI网络 | GCN | C |
1 | EISENBERG D, MARCOTTE E M, XENARIOS I, et al. Protein function in the post-genomic era[J]. Nature, 2000, 405(6788): 823-826. |
2 | RADIVOJAC P, CLARK W T, ORON T R, et al. A large-scale evaluation of computational protein function prediction[J]. Nature Methods, 2013, 10(3): 221-227. |
3 | ISRALEWITZ B, BAUDRY J, GULLINGSRUD J, et al. Steered molecular dynamics investigations of protein function[J]. Journal of Molecular Graphics & Modelling, 2001, 19(1): 13-25. |
4 | KLEPEIS J L, LINDORFF-LARSEN K, DROR R O, et al. Long-timescale molecular dynamics simulations of protein structure and function[J]. Current Opinion in Structural Biology, 2009, 19(2): 120-127. |
5 | PIERRI C L, PARISI G, PORCELLI V. Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening[J]. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 2010, 1804(9): 1695-1712. |
6 | YUAN Q M, CHEN S, RAO J H, et al. AlphaFold2-aware protein-DNA binding site prediction using graph transformer[J]. Briefings in Bioinformatics, 2022, 23(2): bbab564. |
7 | XIA Y, XIA C Q, PAN X Y, et al. GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues[J]. Nucleic Acids Research, 2021, 49(9): e51. |
8 | YUAN Q M, CHEN J W, ZHAO H Y, et al. Structure-aware protein-protein interaction site prediction using deep graph convolutional network[J]. Bioinformatics, 2021, 38(1): 125-132. |
9 | KULMANOV M, HOEHNDORF R. DeepGOPlus: improved protein function prediction from sequence[J]. Bioinformatics, 2020, 36(2): 422-429. |
10 | ZHANG J, KURGAN L. Review and comparative assessment of sequence-based predictors of protein-binding residues[J]. Briefings in Bioinformatics, 2018, 19(5): 821-837. |
11 | KUZMANOV U, EMILI A. Protein-protein interaction networks: probing disease mechanisms using model systems[J]. Genome Medicine, 2013, 5(4): 37. |
12 | WELLS J A, MCCLENDON C L. Reaching for high-hanging fruit in drug discovery at protein-protein interfaces[J]. Nature, 2007, 450(7172): 1001-1009. |
13 | LI Y W, GOLDING G B, ILIE L. DELPHI: accurate deep ensemble model for protein interaction sites prediction[J]. Bioinformatics, 2021, 37(7): 896-904. |
14 | ABDIN O, NIM S, WEN H, et al. PepNN: a deep attention model for the identification of peptide binding sites[J]. Communications Biology, 2022, 5: 503. |
15 | CHEN J W, XIE Z R, WU Y H. Understand protein functions by comparing the similarity of local structural environments[J]. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 2017, 1865(2): 142-152. |
16 | LIN Y F, CHENG C W, SHIH C S, et al. MIB: metal ion-binding site prediction and docking server[J]. Journal of Chemical Information and Modeling, 2016, 56(12): 2287-2291. |
17 | XIA C Q, PAN X Y, SHEN H B. Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data[J]. Bioinformatics, 2020, 36(10): 3018-3027. |
18 | YANG J Y, ROY A, ZHANG Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment[J]. Bioinformatics, 2013, 29(20): 2588-2595. |
19 | HU X Z, DONG Q W, YANG J Y, et al. Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals[J]. Bioinformatics, 2016, 32(21): 3260-3269. |
20 | ASHBURNER M, BALL C A, BLAKE J A, et al. Gene ontology: tool for the unification of biology[J]. Nature Genetics, 2000, 25(1): 25-29. |
21 | DAVIS J, GOADRICH M. The relationship between Precision-Recall and ROC curves[C]//Proceedings of the 23rd international conference on Machine learning. June 25-29, 2006, Pittsburgh, Pennsylvania, USA. New York: ACM, 2006: 233-240. |
22 | CONESA A, GÖTZ S, GARCÍA-GÓMEZ J M, et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research[J]. Bioinformatics, 2005, 21(18): 3674-3676. |
23 | YOU R H, ZHANG Z H, XIONG Y, et al. GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank[J]. Bioinformatics, 2018, 34(14): 2465-2473. |
24 | LI H. A short introduction to learning to rank[J]. IEICE Transactions on Information and Systems, 2011, E94-D(10): 1854-1862. |
25 | CAO Y, SHEN Y. TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding[J]. Bioinformatics, 2021, 37(18): 2825-2833. |
26 | GLIGORIJEVIĆ V, DOUGLAS RENFREW P, KOSCIOLEK T, et al. Structure-based protein function prediction using graph convolutional networks[J]. Nature Communications, 2021, 12: 3168. |
27 | OLIVER S. Guilt-by-association goes global[J]. Nature, 2000, 403(6770): 601-602. |
28 | YOU R H, YAO S W, XIONG Y, et al. NetGO: improving large-scale protein function prediction with massive network information[J]. Nucleic Acids Research, 2019, 47(W1): W379-W387. |
29 | SZKLARCZYK D, GABLE A L, NASTOU K C, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets[J]. Nucleic Acids Research, 2021, 49(D1): D605-D612. |
30 | YAO S W, YOU R H, WANG S J, et al. NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information[J]. Nucleic Acids Research, 2021, 49(W1): W469-W475. |
31 | WANG S Y, LIANG K, HU Q S, et al. JAK2-binding long noncoding RNA promotes breast cancer brain metastasis[J]. The Journal of Clinical Investigation, 2017, 127(12): 4498-4515. |
32 | TIRALONGO J, COOPER O, LITFIN T, et al. YesU from Bacillus subtilis preferentially binds fucosylated glycans[J]. Scientific Reports, 2018, 8: 13139. |
33 | KUMAR R, CORBETT M A, VAN BON B W M, et al. THOC2 mutations implicate mRNA-export pathway in X-linked intellectual disability[J]. The American Journal of Human Genetics, 2015, 97(2): 302-310. |
34 | SCHMIDTKE P, BARRIL X. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites[J]. Journal of Medicinal Chemistry, 2010, 53(15): 5858-5867. |
35 | XU M Y, RAN T, CHEN H M. De novo molecule design through the molecular generative model conditioned by 3D information of protein binding sites[J]. Journal of Chemical Information and Modeling, 2021, 61(7): 3240-3254. |
36 | HEFFERNAN R, YANG Y D, PALIWAL K, et al. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility[J]. Bioinformatics, 2017, 33(18): 2842-2849. |
37 | ALTSCHUL S F, MADDEN T L, SCHÄFFER A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402. |
38 | SUZEK B E, HUANG H Z, MCGARVEY P, et al. UniRef: comprehensive and non-redundant UniProt reference clusters[J]. Bioinformatics, 2007, 23(10): 1282-1288. |
39 | REMMERT M, BIEGERT A, HAUSER A, et al. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment[J]. Nature Methods, 2012, 9(2): 173-175. |
40 | MIRDITA M, VON DEN DRIESCH L, GALIEZ C, et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments[J]. Nucleic Acids Research, 2017, 45(D1): D170-D176. |
41 | MEILER J, MÜLLER M, ZEIDLER A, et al. Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks[J]. Molecular Modeling Annual, 2001, 7(9): 360-369. |
42 | RIVES A, MEIER J, SERCU T, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J]. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118. |
43 | ELNAGGAR A, HEINZINGER M, DALLAGO C, et al. ProtTrans: towards cracking the language of life's code through self-supervised deep learning and high performance computing [EB/OL]. arXiv, 2020: 2007.06225[2023-02-01]. . |
44 | KABSCH W, SANDER C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features[J]. Biopolymers, 1983, 22(12): 2577-2637. |
45 | POROLLO A, MELLER J. Prediction-based fingerprints of protein-protein interactions[J]. Proteins: Structure, Function, and Bioinformatics, 2007, 66(3): 630-645. |
46 | ZHANG J, KURGAN L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences[J]. Bioinformatics, 2019, 35(14): i343-i353. |
47 | ZENG M, ZHANG F H, WU F X, et al. Protein-protein interaction site prediction through combining local and global features with deep neural networks[J]. Bioinformatics, 2020, 36(4): 1114-1120. |
48 | GAINZA P, SVERRISSON F, MONTI F, et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning[J]. Nature Methods, 2020, 17(2): 184-192. |
49 | TAHERZADEH G, YANG Y D, ZHANG T, et al. Sequence-based prediction of protein-peptide binding sites using support vector machine[J]. Journal of Computational Chemistry, 2016, 37(13): 1223-1229. |
50 | ZHAO Z J, PENG Z L, YANG J Y. Improving sequence-based prediction of protein-peptide binding residues by introducing intrinsic disorder and a consensus method[J]. Journal of Chemical Information and Modeling, 2018, 58(7): 1459-1468. |
51 | WARDAH W, DEHZANGI A, TAHERZADEH G, et al. Predicting protein-peptide binding sites with a deep convolutional neural network[J]. Journal of Theoretical Biology, 2020, 496: 110278. |
52 | ZHU Y H, HU J, SONG X N, et al. DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines[J]. Journal of Chemical Information and Modeling, 2019, 59(6): 3057-3071. |
53 | SU H, LIU M C, SUN S S, et al. Improving the prediction of protein-nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods[J]. Bioinformatics, 2019, 35(6): 930-936. |
54 | WU Q, PENG Z L, ZHANG Y, et al. COACH-D: improved protein-ligand binding sites prediction with refined ligand-binding poses through molecular docking[J]. Nucleic Acids Research, 2018, 46(W1): W438-W442. |
55 | ZHANG J, CHEN Q C, LIU B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning[J]. Briefings in Bioinformatics, 2021, 22(5): bbaa397. |
56 | YU D J, HU J, YANG J, et al. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2013, 10(4): 994-1008. |
57 | ROY A, YANG J Y, ZHANG Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation[J]. Nucleic Acids Research, 2012, 40(W1): W471-W477. |
58 | OFER D, LINIAL M. ProFET: feature engineering captures high-level protein functions[J]. Bioinformatics, 2015, 31(21): 3429-3436. |
59 | KOZLOVSKII I, POPOV P. Protein-peptide binding site detection using 3D convolutional neural networks[J]. Journal of Chemical Information and Modeling, 2021, 61(8): 3814-3823. |
60 | CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation [EB/OL]. arXiv, 2014: 1406.1078[2023-02-01]. . |
61 | GRAVES A. Long short-term memory[M]//Studies in Computational Intelligence: Supervised sequence labelling with recurrent neural networks. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012: 37-45. |
62 | LECUN Y, BENGIO Y. Convolutional networks for images, speech, and time series[M/OL]//The handbook of brain theory and neural networks. Cambridge, MA, USA: MIT Press, 1995[2023-02-01]. . |
63 | YUAN Q M, CHEN S, WANG Y, et al. Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning[J]. Briefings in Bioinformatics, 2022, 23(6): bbac444. |
64 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need. Advances in neural information processing systems[C/OL]//Advances in Neural Information Processing Systems 30-NeurIPS 2017[2023-02-01]. . |
65 | ZHENG S J, RAO J H, ZHANG Z Y, et al. Predicting retrosynthetic reactions using self-corrected transformer neural networks[J]. Journal of Chemical Information and Modeling, 2020, 60(1): 47-55. |
66 | FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. August 6-11, 2017, Sydney, NSW, Australia. New York: ACM, 2017: 1126-1135. |
67 | WANG J H, ZHENG S J, CHEN J W, et al. Meta learning for low-resource molecular optimization[J]. Journal of Chemical Information and Modeling, 2021, 61(4): 1627-1636. |
68 | SUN Z, ZHENG S J, ZHAO H Y, et al. To improve prediction of binding residues with DNA, RNA, carbohydrate, and peptide via multi-task deep neural networks[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 19(6): 3735-3743. |
69 | ZHANG F H, ZHAO B, SHI W B, et al. DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning[J]. Briefings in Bioinformatics, 2022, 23(1): bbab521. |
70 | ZHANG Y, YANG Q. An overview of multi-task learning[J]. National Science Review, 2018, 5(1): 30-43. |
71 | CARUANA R. Multitask learning[J].Machine Learning, 1997, 28(1): 41-75. |
72 | MERINO G A, SAIDI R, MILONE D H, et al. Hierarchical deep learning for predicting GO annotations by integrating protein knowledge[J]. Bioinformatics, 2022, 38(19): 4488-4496. |
73 | ZHANG C X, FREDDOLINO P L, ZHANG Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information[J]. Nucleic Acids Research, 2017, 45(W1): W291-W299. |
74 | KULMANOV M, KHAN M A, HOEHNDORF R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier[J]. Bioinformatics, 2018, 34(4): 660-668. |
75 | LAI B Q, XU J B. Accurate protein function prediction via graph attention networks with predicted structure information[J]. Briefings in Bioinformatics, 2022, 23(1): bbab502. |
76 | XU J B, MCPARTLON M, LI J. Improved protein structure prediction by deep learning irrespective of co-evolution information[J]. Nature Machine Intelligence, 2021, 3(7): 601-609. |
77 | ALTSCHUL S F, GISH W, MILLER W, et al. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215(3): 403-410. |
78 | VILLEGAS-MORCILLO A, MAKRODIMITRIS S, VAN HAM R C H J, et al. Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function[J]. Bioinformatics, 2021, 37(2): 162-170. |
79 | VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. arXiv, 2017[2023-02-01]. . |
80 | LEE J Y, LEE I Y, KANG J W. Self-attention graph pooling[C/OL]//Proceedings of the 22nd international conference on Machine learning, 9-15 June 2019, Long Beach, California, USA, 97:3734-3743 [2023-02-01]. . |
81 | BOUTET E, LIEBERHERR D, TOGNOLLI M, et al. UniProtKB/Swiss-prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view[M]//Plant Bioinformatics. New York: Springer New York, 2016: 23-54. |
82 | TORRES M, YANG H X, ROMERO A E, et al. Protein function prediction for newly sequenced organisms[J]. Nature Machine Intelligence, 2021, 3(12): 1050-1060. |
83 | MOSTAFAVI S, RAY D, WARDE-FARLEY D, et al. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function[J]. Genome Biology, 2008, 9(): S4. |
84 | YOU R H, YAO S W, MAMITSUKA H, et al. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction[J]. Bioinformatics, 2021, 37(): i262-i271. |
85 | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. arXiv, 2016: 1609.02907[2023-02-01]. . |
86 | MITCHELL A L, ATTWOOD T K, BABBITT P C, et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations[J]. Nucleic Acids Research, 2019, 47(D1): D351-D360. |
87 | FINN R D, COGGILL P, EBERHARDT R Y, et al. The Pfam protein families database: towards a more sustainable future[J]. Nucleic Acids Research, 2016, 44(D1): D279-D285. |
88 | OATES M E, STAHLHACKE J, VAVOULIS D V, et al. The SUPERFAMILY 1.75 database in 2014: a doubling of data[J]. Nucleic Acids Research, 2015, 43(D1): D227-D233. |
89 | LEWIS T E, SILLITOE I, DAWSON N, et al. Gene3D: extensive prediction of globular domains in proteins[J]. Nucleic Acids Research, 2018, 46(D1): D1282. |
90 | MARCHLER-BAUER A, BO Y, HAN L Y, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures[J]. Nucleic Acids Research, 2017, 45(D1): D200-D203. |
91 | ZHOU J, CUI G Q, HU S D, et al. Graph neural networks: a review of methods and applications[J]. AI Open, 2020, 1: 57-81. |
92 | LIN Z M, AKIN H, RAO R S, et al., Language models of protein sequences at the scale of evolution enable accurate structure prediction [EB/OL]. bioRxiv, 2022[2023-02-01].. |
93 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
94 | JING B W, EISMANN S, SURIANA P, et al. Learning from protein structure with geometric vector perceptrons[EB/OL]. arXiv, 2020: 2009.01411[2023-02-01]. . |
95 | YUN S J, JEONG M Y, KIM R Y, et al. Graph transformer networks[C/OL]//Advances in Neural Information Processing Systems 32-NeurIPS 2019[2023-02-01]. . |
96 | CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020: 1597-1607. |
97 | ZHU Y H, ZHANG C X, YU D J, et al. Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction[J]. PLoS Computational Biology, 2022, 18(12): e1010793. |
98 | ZHENG S J, RAO J H, SONG Y, et al. PharmKG: a dedicated knowledge graph benchmark for bomedical data mining[J]. Briefings in Bioinformatics, 2021, 22(4): bbaa344. |
[1] | Wanqiu LIU, Xiangyang JI, Huiling XU, Yicong LU, Jian LI. Cell-free protein synthesis system enables rapid and efficient biosynthesis of restriction endonucleases [J]. Synthetic Biology Journal, 2023, 4(4): 840-851. |
[2] | Mengdan MA, Mengyu SHANG, Yuchen LIU. Application and prospect of CRISPR-Cas9 system in tumor biology [J]. Synthetic Biology Journal, 2023, 4(4): 703-719. |
[3] | He HUANG, Tong WU, Wenda WANG, Jiashan LI, Daiwen SUN, Qiwei YE, Xinqi GONG. Prediction of protein complex structure: methods and progress [J]. Synthetic Biology Journal, 2023, 4(3): 507-523. |
[4] | Zhihang CHEN, Menglin JI, Yifei QI. Research progress of artificial intelligence in desiging protein structures [J]. Synthetic Biology Journal, 2023, 4(3): 464-487. |
[5] | Yiming TANG, Yifei YAO, Zhongyuan YANG, Yun ZHOU, Zichao WANG, Guanghong WEI. Pathological aggregation and liquid-liquid phase separation of proteins associated with neurodegenerative diseases [J]. Synthetic Biology Journal, 2023, 4(3): 590-610. |
[6] | Qiaozhen MENG, Fei GUO. Applications of foldability in intelligent enzyme engineering and design: take AlphaFold2 for example [J]. Synthetic Biology Journal, 2023, 4(3): 571-589. |
[7] | Sheng WANG, Zechen WANG, Weihua CHEN, Ke CHEN, Xiangda PENG, Fafen OU, Liangzhen ZHENG, Jinyuan SUN, Tao SHEN, Guoping ZHAO. Design of synthetic biology components based on artificial intelligence and computational biology [J]. Synthetic Biology Journal, 2023, 4(3): 422-443. |
[8] | Qingyun RUAN, Xin HUANG, Zijun MENG, Shu QUAN. Computational design and directed evolution strategies for optimizing protein stability [J]. Synthetic Biology Journal, 2023, 4(1): 5-29. |
[9] | Weiwei NI, Lingjia ZHOU, Hao WANG, Fei LI, Jinsong HAN. Research progress in the construction of nucleic acid and protein biomolecular sensor arrays and their applications for rapid detection [J]. Synthetic Biology Journal, 2023, 4(1): 185-203. |
[10] | Liya LIANG, Rongming LIU. Protein engineering of DNA targeting type Ⅱ CRISPR/Cas systems [J]. Synthetic Biology Journal, 2023, 4(1): 86-101. |
[11] | Xiaolong TENG, Shuobo SHI. Optimization and development of CRISPR/Cas9 systems for genome editing [J]. Synthetic Biology Journal, 2023, 4(1): 67-85. |
[12] | Renmei LIU, Leshi LI, Xiaoyan YANG, Xianjun CHEN, Yi YANG. Technologies for precise spatiotemporal control of post-transcriptional RNA metabolism [J]. Synthetic Biology Journal, 2023, 4(1): 141-164. |
[13] | Jingwei LYU, Zixin DENG, Qi ZHANG, Wei DING. Identification of RiPPs precursor peptides and cleavage sites based on deep learning [J]. Synthetic Biology Journal, 2022, 3(6): 1262-1276. |
[14] | Yanping QI, Jin ZHU, Kai ZHANG, Tong LIU, Yajie WANG. Recent development of directed evolution in protein engineering [J]. Synthetic Biology Journal, 2022, 3(6): 1081-1108. |
[15] | Shiming TANG, Jiyuan HU, Suiping ZHENG, Shuangyan HAN, Ying LIN. Designing, building and rapid prototyping of biosynthesis module based on cell-free system [J]. Synthetic Biology Journal, 2022, 3(6): 1250-1261. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||