深度学习在蛋白质功能预测中的应用

[2]

RADIVOJAC

P

, CLARK

W T

, ORON

T R

, et al.

A large-scale evaluation of computational protein function prediction

[J]. Nature Methods, 2013, 10(3): 221-227.

[3]

ISRALEWITZ

B

, BAUDRY

J

, GULLINGSRUD

J

, et al.

Steered molecular dynamics investigations of protein function

[J]. Journal of Molecular Graphics & Modelling, 2001, 19(1): 13-25.

[4]

KLEPEIS

J L

, LINDORFF-LARSEN

K

, DROR

R O

, et al.

Long-timescale molecular dynamics simulations of protein structure and function

[J]. Current Opinion in Structural Biology, 2009, 19(2): 120-127.

[5]

PIERRI

C L

, PARISI

G

, PORCELLI

V

.

Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening

[J]. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, 2010, 1804(9): 1695-1712.

[6]

YUAN

Q M

, CHEN

S

, RAO

J H

, et al.

AlphaFold2-aware protein-DNA binding site prediction using graph transformer

[J]. Briefings in Bioinformatics, 2022, 23(2): bbab564.

[本文引用: 5]

[7]

XIA

Y

, XIA

C Q

, PAN

X Y

, et al.

GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues

[J]. Nucleic Acids Research, 2021, 49(9): e51.

[8]

YUAN

Q M

, CHEN

J W

, ZHAO

H Y

, et al.

Structure-aware protein-protein interaction site prediction using deep graph convolutional network

[J]. Bioinformatics, 2021, 38(1): 125-132.

[9]

KULMANOV

M

, HOEHNDORF

R

.

DeepGOPlus: improved protein function prediction from sequence

[J]. Bioinformatics, 2020, 36(2): 422-429.

[本文引用: 4]

[10]

ZHANG

J

, KURGAN

L

.

Review and comparative assessment of sequence-based predictors of protein-binding residues

[J]. Briefings in Bioinformatics, 2018, 19(5): 821-837.

[11]

KUZMANOV

U

, EMILI

A

.

Protein-protein interaction networks: probing disease mechanisms using model systems

[J]. Genome Medicine, 2013, 5(4): 37.

[12]

WELLS

J A

, MCCLENDON

C L

.

Reaching for high-hanging fruit in drug discovery at protein–protein interfaces

[J]. Nature, 2007, 450(7172): 1001-1009.

[13]

LI

Y W

, GOLDING

G B

, ILIE

L

.

DELPHI: accurate deep ensemble model for protein interaction sites prediction

[J]. Bioinformatics, 2021, 37(7): 896-904.

[14]

ABDIN

O

, NIM S, WEN

H

, et al.

PepNN: a deep attention model for the identification of peptide binding sites

[J]. Communications Biology, 2022, 5: 503.

[15]

CHEN

J W

, XIE

Z R

, WU

Y H

.

Understand protein functions by comparing the similarity of local structural environments

[J]. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, 2017, 1865(2): 142-152.

[16]

LIN

Y F

, CHENG

C W

, SHIH

C S

, et al.

MIB: metal ion-binding site prediction and docking server

[J]. Journal of Chemical Information and Modeling, 2016, 56(12): 2287-2291.

[17]

XIA

C Q

, PAN

X Y

, SHEN

H B

.

Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data

[J]. Bioinformatics, 2020, 36(10): 3018-3027.

[18]

YANG

J Y

, ROY

A

, ZHANG

Y

.

Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment

[J]. Bioinformatics, 2013, 29(20): 2588-2595.

[19]

HU

X Z

, DONG

Q W

, YANG

J Y

, et al.

Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals

[J]. Bioinformatics, 2016, 32(21): 3260-3269.

[20]

ASHBURNER

M

, BALL

C A

, BLAKE

J A

, et al.

Gene ontology: tool for the unification of biology

[J]. Nature Genetics, 2000, 25(1): 25-29.

[21]

DAVIS

J

, GOADRICH

M

.

The relationship between Precision-Recall and ROC curves

[C]//Proceedings of the 23rd international conference on Machine learning. June 25 - 29, 2006, Pittsburgh, Pennsylvania, USA. New York: ACM, 2006: 233-240.

[22]

CONESA

A

, GÖTZ

S

, GARCÍA-GÓMEZ

J M

, et al.

Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research

[J]. Bioinformatics, 2005, 21(18): 3674-3676.

[23]

YOU

R H

, ZHANG

Z H

, XIONG

Y

, et al.

GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank

[J]. Bioinformatics, 2018, 34(14): 2465-2473.

[24]

LI

H

.

A short introduction to learning to rank

[J]. IEICE Transactions on Information and Systems, 2011, E94-D(10): 1854-1862.

[25]

CAO

Y

, SHEN

Y

.

TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding

[J]. Bioinformatics, 2021, 37(18): 2825-2833.

[26]

GLIGORIJEVIĆ

V

, DOUGLAS RENFREW

P

, KOSCIOLEK

T

, et al.

Structure-based protein function prediction using graph convolutional networks

[J]. Nature Communications, 2021, 12: 3168.

[本文引用: 4]

[27]

OLIVER

S

.

Guilt-by-association goes global

[J]. Nature, 2000, 403(6770): 601-602.

[28]

YOU

R H

, YAO

S W

, XIONG

Y

, et al.

NetGO: improving large-scale protein function prediction with massive network information

[J]. Nucleic Acids Research, 2019, 47(W1): W379-W387.

[29]

SZKLARCZYK

D

, GABLE

A L

, NASTOU

K C

, et al.

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets

[J]. Nucleic Acids Research, 2021, 49(D1): D605-D612.

[30]

YAO

S W

, YOU

R H

, WANG

S J

, et al.

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information

[J]. Nucleic Acids Research, 2021, 49(W1): W469-W475.

[31]

WANG

S Y

, LIANG

K

, HU

Q S

, et al.

JAK2-binding long noncoding RNA promotes breast cancer brain metastasis

[J]. The Journal of Clinical Investigation, 2017, 127(12): 4498-4515.

[32]

TIRALONGO

J

, COOPER

O

, LITFIN

T

, et al.

YesU from Bacillus subtilis preferentially binds fucosylated glycans

[J]. Scientific Reports, 2018, 8: 13139.

[33]

KUMAR

R

, CORBETT

M A

, VAN

BON B W M

, et al.

THOC2 mutations implicate mRNA-export pathway in X-linked intellectual disability

[J]. The American Journal of Human Genetics, 2015, 97(2): 302-310.

[34]

SCHMIDTKE

P

, BARRIL

X

.

Understanding and predicting druggability. A high-throughput method for detection of drug binding sites

[J]. Journal of Medicinal Chemistry, 2010, 53(15): 5858-5867.

[35]

XU

M Y

, RAN

T

, CHEN

H M

.

De novo molecule design through the molecular generative model conditioned by 3D information of protein binding sites

[J]. Journal of Chemical Information and Modeling, 2021, 61(7): 3240-3254.

[36]

HEFFERNAN

R

, YANG

Y D

, PALIWAL

K

, et al.

Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility

[J]. Bioinformatics, 2017, 33(18): 2842-2849.

[37]

ALTSCHUL

S F

, MADDEN

T L

, SCHÄFFER

A A

, et al.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402.

[38]

SUZEK

B E

, HUANG

H Z

, MCGARVEY

P

, et al.

UniRef: comprehensive and non-redundant UniProt reference clusters

[J]. Bioinformatics, 2007, 23(10): 1282-1288.

[39]

REMMERT

M

, BIEGERT

A

, HAUSER

A

, et al.

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

[J]. Nature Methods, 2012, 9(2): 173-175.

[40]

MIRDITA

M

, VON DEN DRIESCH

L

, GALIEZ

C

, et al.

Uniclust databases of clustered and deeply annotated protein sequences and alignments

[J]. Nucleic Acids Research, 2017, 45(D1): D170-D176.

[41]

MEILER

J

, MÜLLER

M

, ZEIDLER

A

, et al.

Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks

[J]. Molecular Modeling Annual, 2001, 7(9): 360-369.

[42]

RIVES

A

, MEIER

J

, SERCU

T

, et al.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

[J]. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118.

[43]

ELNAGGAR

A

, HEINZINGER

M

, DALLAGO

C

, et al.

ProtTrans: towards cracking the language of life's code through self-supervised deep learning and high performance computing

"[EB/OL]. arXiv, 2020: 2007.06225[2023-02-01]. .

[44]

KABSCH

W

, SANDER

C

.

Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features

[J]. Biopolymers, 1983, 22(12): 2577-2637.

[45]

POROLLO

A

, MELLER

J

.

Prediction-based fingerprints of protein-protein interactions

[J]. Proteins: Structure, Function, and Bioinformatics, 2007, 66(3): 630-645.

[46]

ZHANG

J

, KURGAN

L

.

SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences

[J]. Bioinformatics, 2019, 35(14): i343-i353.

[47]

ZENG

M

, ZHANG

F H

, WU

F X

, et al.

Protein-protein interaction site prediction through combining local and global features with deep neural networks

[J]. Bioinformatics, 2020, 36(4): 1114-1120.

[48]

GAINZA

P

, SVERRISSON

F

, MONTI

F

, et al.

Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning

[J]. Nature Methods, 2020, 17(2): 184-192.

[49]

TAHERZADEH

G

, YANG

Y D

, ZHANG

T

, et al.

Sequence-based prediction of protein-peptide binding sites using support vector machine

[J]. Journal of Computational Chemistry, 2016, 37(13): 1223-1229.

[50]

ZHAO

Z J

, PENG

Z L

, YANG

J Y

.

Improving sequence-based prediction of protein-peptide binding residues by introducing intrinsic disorder and a consensus method

[J]. Journal of Chemical Information and Modeling, 2018, 58(7): 1459-1468.

[51]

WARDAH

W

, DEHZANGI

A

, TAHERZADEH

G

, et al.

Predicting protein-peptide binding sites with a deep convolutional neural network

[J]. Journal of Theoretical Biology, 2020, 496: 110278.

[52]

ZHU

Y H

, HU

J

, SONG

X N

, et al.

DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines

[J]. Journal of Chemical Information and Modeling, 2019, 59(6): 3057-3071.

[53]

SU

H

, LIU

M C

, SUN

S S

, et al.

Improving the prediction of protein-nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods

[J]. Bioinformatics, 2019, 35(6): 930-936.

[54]

WU

Q

, PENG

Z L

, ZHANG

Y

, et al.

COACH-D: Improved protein-ligand binding sites prediction with refined ligand-binding poses through molecular docking

[J]. Nucleic Acids Research, 2018, 46(W1): W438-W442.

[55]

ZHANG

J

, CHEN

Q C

, LIU

B

.

NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning

[J]. Briefings in Bioinformatics, 2021, 22(5): bbaa397.

[56]

YU

D J

, HU

J

, YANG

J

, et al.

Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering

[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2013, 10(4): 994-1008.

[57]

ROY

A

, YANG

J Y

, ZHANG

Y

.

COFACTOR: an accurate comparative algorithm for structure-based protein function annotation

[J]. Nucleic Acids Research, 2012, 40(W1): W471-W477.

[58]

OFER

D

, LINIAL

M

.

ProFET: feature engineering captures high-level protein functions

[J]. Bioinformatics, 2015, 31(21): 3429-3436.

[59]

KOZLOVSKII

I

, POPOV

P

.

Protein-peptide binding site detection using 3D convolutional neural networks

[J]. Journal of Chemical Information and Modeling, 2021, 61(8): 3814-3823.

[60]

CHO

K

, VAN MERRIENBOER

B

, GULCEHRE

C

, et al.

Learning phrase representations using RNN encoder-decoder for statistical machine translation

"[EB/OL]. arXiv, 2014: 1406.1078[2023-02-01]. .

[61]

GRAVES

A

.

Long short-term memory

[M]//Studies in Computational Intelligence: Supervised sequence labelling with recurrent neural networks. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012: 37-45.

[62]

LECUN

Y

, BENGIO

Y

.

Convolutional networks for images, speech, and time series

[M/OL]//The handbook of brain theory and neural networks. Cambridge, MA, USA: MIT Press, 1995[2023-02-01]. .

[63]

YUAN

Q M

, CHEN

S

, WANG

Y

, et al.

Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning

[J]. Briefings in Bioinformatics, 2022, 23(6): bbac444.

[64]

VASWANI

A

, SHAZEER

N

, PARMAR

N

, et al.

Attention is all you need. Advances in neural information processing systems

[C/OL]//Advances in Neural Information Processing Systems 30 - NeurIPS 2017[2023-02-01]. .

[65]

ZHENG

S J

, RAO

J H

, ZHANG

Z Y

, et al.

Predicting retrosynthetic reactions using self-corrected transformer neural networks

[J]. Journal of Chemical Information and Modeling, 2020, 60(1): 47-55.

[66]

FINN

C

, ABBEEL

P

, LEVINE

S

.

Model-agnostic meta-learning for fast adaptation of deep networks

[C]//Proceedings of the 34th International Conference on Machine Learning - Volume 70. August 6 - 11, 2017, Sydney, NSW, Australia. New York: ACM, 2017: 1126-1135.

[67]

WANG

J H

, ZHENG

S J

, CHEN

J W

, et al.

Meta learning for low-resource molecular optimization

[J]. Journal of Chemical Information and Modeling, 2021, 61(4): 1627-1636.

[68]

SUN

Z

, ZHENG

S J

, ZHAO

H Y

, et al.

To improve prediction of binding residues with DNA, RNA, carbohydrate, and peptide via multi-task deep neural networks

[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 19(6): 3735-3743.

[69]

ZHANG

F H

, ZHAO

B

, SHI

W B

, et al.

DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning

[J]. Briefings in Bioinformatics, 2022, 23(1): bbab521.

[70]

ZHANG

Y

, YANG

Q

.

An overview of multi-task learning

[J]. National Science Review, 2018, 5(1): 30-43.

[71]

CARUANA

R

.

Multitask learning

[J].Machine Learning, 1997, 28(1): 41-75.

[72]

MERINO

G A

, SAIDI

R

, MILONE

D H

, et al.

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge

[J]. Bioinformatics, 2022, 38(19): 4488-4496.

[73]

ZHANG

C X

, FREDDOLINO

P L

, ZHANG

Y

.

COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information

[J]. Nucleic Acids Research, 2017, 45(W1): W291-W299.

[74]

KULMANOV

M

, KHAN

M A

, HOEHNDORF

R

.

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

[J]. Bioinformatics, 2018, 34(4): 660-668.

[75]

LAI

B Q

, XU

J B

.

Accurate protein function prediction via graph attention networks with predicted structure information

[J]. Briefings in Bioinformatics, 2022, 23(1): bbab502.

[76]

XU

J B

, MCPARTLON

M

, LI

J

.

Improved protein structure prediction by deep learning irrespective of co-evolution information

[J]. Nature Machine Intelligence, 2021, 3(7): 601-609.

[77]

ALTSCHUL

S F

, GISH

W

, MILLER

W

, et al.

Basic local alignment search tool

[J]. Journal of Molecular Biology, 1990, 215(3): 403-410.

[78]

VILLEGAS-MORCILLO

A

, MAKRODIMITRIS

S

, VAN

HAM R C H J

, et al.

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function

[J]. Bioinformatics, 2021, 37(2): 162-170.

[79]

VELIČKOVIĆ

P

, CUCURULL

G

, CASANOVA

A

, et al.

Graph attention networks

[EB/OL]. arXiv, 2017[2023-02-01]. .

[80]

LEE

J Y

, LEE

I Y

, KANG

J W

.

Self-attention graph pooling

[C/OL]. Proceedings of the 22nd international conference on Machine learning, 9-15 June 2019, Long Beach, California, USA, 97:3734-3743[2023-02-01]. .

[81]

BOUTET

E

, LIEBERHERR

D

, TOGNOLLI

M

, et al.

UniProtKB/Swiss-prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view

[M]//Plant Bioinformatics. New York: Springer New York, 2016: 23-54.

[82]

TORRES

M

, YANG

H X

, ROMERO

A E

, et al.

Protein function prediction for newly sequenced organisms

[J]. Nature Machine Intelligence, 2021, 3(12): 1050-1060.

[83]

MOSTAFAVI

S

, RAY D, WARDE-FARLEY

D

, et al.

GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function

[J]. Genome Biology, 2008, 9(): S4.

[84]

YOU

R H

, YAO

S W

, MAMITSUKA

H

, et al.

DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction

[J]. Bioinformatics, 2021, 37(): i262-i271.

[85]

KIPF

T N

, WELLING

M

.

Semi-supervised classification with graph convolutional networks

[EB/OL]. arXiv, 2016: 1609.02907[2023-02-01]. .

[86]

MITCHELL

A L

, ATTWOOD

T K

, BABBITT

P C

, et al.

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

[J]. Nucleic Acids Research, 2019, 47(D1): D351-D360.

[87]

FINN

R D

, COGGILL

P

, EBERHARDT

R Y

, et al.

The Pfam protein families database: towards a more sustainable future

[J]. Nucleic Acids Research, 2016, 44(D1): D279-D285.

[88]

OATES

M E

, STAHLHACKE

J

, VAVOULIS

D V

, et al.

The SUPERFAMILY 1.75 database in 2014: a doubling of data

[J]. Nucleic Acids Research, 2015, 43(D1): D227-D233.

[89]

LEWIS

T E

, SILLITOE

I

, DAWSON

N

, et al.

Gene3D: extensive prediction of globular domains in proteins

[J]. Nucleic Acids Research, 2018, 46(D1): D1282.

[90]

MARCHLER-BAUER

A

, BO

Y

, HAN

L Y

, et al.

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures

[J]. Nucleic Acids Research, 2017, 45(D1): D200-D203.

[91]

ZHOU

J

, CUI

G Q

, HU

S D

, et al.

Graph neural networks: A review of methods and applications

[J]. AI Open, 2020, 1: 57-81.

[92]

LIN

Z M

, AKIN

H

, RAO

R S

, et al.,

Language models of protein sequences at the scale of evolution enable accurate structure prediction

[EB/OL]. bioRxiv, 2022[2023-02-01]..

[93]

JUMPER

J

, EVANS

R

, PRITZEL

A

, et al.

Highly accurate protein structure prediction with AlphaFold

[J]. Nature, 2021, 596(7873): 583-589.

[94]

JING

B W

, EISMANN

S

, SURIANA

P

, et al.

Learning from protein structure with geometric vector perceptrons

[EB/OL]. arXiv, 2020: 2009.01411[2023-02-01]. .

[95]

YUN

S J

, JEONG

M Y

, KIM

R Y

, et al.

Graph transformer networks

[C/OL]//Advances in Neural Information Processing Systems 32 - NeurIPS 2019[2023-02-01]. .

[96]

CHEN

T

, KORNBLITH

S

, NOROUZI

M

, et al.

A simple framework for contrastive learning of visual representations

[C]//Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020: 1597-1607.

[97]

ZHU

Y H

, ZHANG

C X

, YU

D J

, et al.

Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction

[J]. PLoS Computational Biology, 2022, 18(12): e1010793.

[98]

ZHENG

S J

, RAO

J H

, SONG

Y

, et al.

PharmKG: a dedicated knowledge graph benchmark for bomedical data mining

[J]. Briefings in Bioinformatics, 2021, 22(4): bbaa344.