Synthetic Biology Journal ›› 2023, Vol. 4 ›› Issue (3): 524-534.DOI: 10.12211/2096-8280.2023-009
• Invited Review • Previous Articles Next Articles
Liqi KANG1,2, Pan TAN3, Liang HONG1,2
Received:
2023-01-16
Revised:
2023-03-29
Online:
2023-07-05
Published:
2023-06-30
Contact:
Pan TAN, Liang HONG
康里奇1,2, 谈攀3, 洪亮1,2
通讯作者:
谈攀,洪亮
作者简介:
基金资助:
CLC Number:
Liqi KANG, Pan TAN, Liang HONG. Enzyme engineering in the age of artificial intelligence[J]. Synthetic Biology Journal, 2023, 4(3): 524-534.
康里奇, 谈攀, 洪亮. 人工智能时代下的酶工程[J]. 合成生物学, 2023, 4(3): 524-534.
Add to citation manager EndNote|Ris|BibTeX
URL: https://synbioj.cip.com.cn/EN/10.12211/2096-8280.2023-009
蛋白质适应度分类 | 数据集 | ESM-IF1 | ESM-1v | MSA transformer | ProGen2 | Tranception |
---|---|---|---|---|---|---|
催化活性 | B3VI55_LIPST | 0.291 | 0.272 | 0.316 | 0.239 | 0.290 |
MTH3_HAEAESTABILIZED | 0.423 | 0.488 | 0.564 | 0.507 | 0.479 | |
KKA2_KLEPN | 0.204 | 0.198 | 0.153 | 0.191 | 0.077 | |
MK01_HUMAN | 0.155 | 0.164 | 0.182 | 0.200 | 0.203 | |
AMIE_PSEAE | 0.295 | 0.537 | 0.523 | 0.526 | 0.517 | |
RASH_HUMAN | 0.070 | 0.131 | 0.089 | 0.135 | 0.085 | |
UBC9_HUMAN | 0.485 | 0.518 | 0.425 | 0.484 | 0.476 | |
BG_STRSQ | 0.665 | 0.670 | 0.727 | 0.749 | 0.656 | |
TRPC_THEMA | 0.392 | 0.488 | 0.462 | 0.397 | 0.444 | |
TIM_SULSO | 0.506 | 0.617 | 0.613 | 0.529 | 0.594 | |
P84126_THETH | 0.519 | 0.564 | 0.656 | 0.558 | 0.548 | |
BLAT_ECOLX | 0.673 | 0.692 | 0.538 | 0.601 | 0.622 | |
稳定性 | PTEN_HUMAN | 0.559 | 0.458 | 0.366 | 0.471 | 0.430 |
TPMT_HUMAN | 0.560 | 0.531 | 0.530 | 0.458 | 0.478 | |
肽段结合能力 | DLG4_RAT | 0.468 | 0.531 | 0.224 | 0.361 | 0.446 |
WW | 0.415 | 0.399 | 0.441 | 0.309 | 0.563 | |
蛋白质结合能力 | IF1_ECOLI | 0.337 | 0.356 | 0.363 | 0.246 | 0.347 |
SUMO1_HUMAN | 0.543 | 0.548 | 0.565 | 0.482 | 0.423 | |
RL40B_YEAST | 0.372 | 0.365 | 0.647 | 0.473 | 0.455 | |
DNA结合能力 | FOSJUN | 0.532 | 0.464 | 0.366 | 0.515 | 0.536 |
GAL4_YEAST | 0.326 | 0.476 | 0.386 | 0.468 | 0.458 | |
RNA 结合能力 | RRM | 0.443 | 0.536 | 0.509 | 0.512 | 0.407 |
TDP43 | 0.158 | 0.026 | 0.117 | 0.013 | 0.125 | |
Ig-G结合能力 | GB1 | 0.337 | 0.105 | 0.329 | 0.232 | 0.254 |
平均值 | 0.413 | 0.453 | 0.438 | 0.376 | 0.374 |
Table 1 Spearman correlation for predicted fitness developed with unsupervised models
蛋白质适应度分类 | 数据集 | ESM-IF1 | ESM-1v | MSA transformer | ProGen2 | Tranception |
---|---|---|---|---|---|---|
催化活性 | B3VI55_LIPST | 0.291 | 0.272 | 0.316 | 0.239 | 0.290 |
MTH3_HAEAESTABILIZED | 0.423 | 0.488 | 0.564 | 0.507 | 0.479 | |
KKA2_KLEPN | 0.204 | 0.198 | 0.153 | 0.191 | 0.077 | |
MK01_HUMAN | 0.155 | 0.164 | 0.182 | 0.200 | 0.203 | |
AMIE_PSEAE | 0.295 | 0.537 | 0.523 | 0.526 | 0.517 | |
RASH_HUMAN | 0.070 | 0.131 | 0.089 | 0.135 | 0.085 | |
UBC9_HUMAN | 0.485 | 0.518 | 0.425 | 0.484 | 0.476 | |
BG_STRSQ | 0.665 | 0.670 | 0.727 | 0.749 | 0.656 | |
TRPC_THEMA | 0.392 | 0.488 | 0.462 | 0.397 | 0.444 | |
TIM_SULSO | 0.506 | 0.617 | 0.613 | 0.529 | 0.594 | |
P84126_THETH | 0.519 | 0.564 | 0.656 | 0.558 | 0.548 | |
BLAT_ECOLX | 0.673 | 0.692 | 0.538 | 0.601 | 0.622 | |
稳定性 | PTEN_HUMAN | 0.559 | 0.458 | 0.366 | 0.471 | 0.430 |
TPMT_HUMAN | 0.560 | 0.531 | 0.530 | 0.458 | 0.478 | |
肽段结合能力 | DLG4_RAT | 0.468 | 0.531 | 0.224 | 0.361 | 0.446 |
WW | 0.415 | 0.399 | 0.441 | 0.309 | 0.563 | |
蛋白质结合能力 | IF1_ECOLI | 0.337 | 0.356 | 0.363 | 0.246 | 0.347 |
SUMO1_HUMAN | 0.543 | 0.548 | 0.565 | 0.482 | 0.423 | |
RL40B_YEAST | 0.372 | 0.365 | 0.647 | 0.473 | 0.455 | |
DNA结合能力 | FOSJUN | 0.532 | 0.464 | 0.366 | 0.515 | 0.536 |
GAL4_YEAST | 0.326 | 0.476 | 0.386 | 0.468 | 0.458 | |
RNA 结合能力 | RRM | 0.443 | 0.536 | 0.509 | 0.512 | 0.407 |
TDP43 | 0.158 | 0.026 | 0.117 | 0.013 | 0.125 | |
Ig-G结合能力 | GB1 | 0.337 | 0.105 | 0.329 | 0.232 | 0.254 |
平均值 | 0.413 | 0.453 | 0.438 | 0.376 | 0.374 |
1 | COBB R E, CHAO R, ZHAO H M. Directed evolution: past, present, and future[J]. AIChE Journal, 2013, 59(5): 1432-1440. |
2 | LERNER S A, WU T T, LIN E C. Evolution of a catabolic pathway in bacteria[J]. Science, 1964, 146(3649): 1313-1315. |
3 | SARAC I, HOLLENSTEIN M. Terminal deoxynucleotidyl transferase in the synthesis and modification of nucleic acids[J]. ChemBioChem, 2019, 20(7): 860-871. |
4 | TOBIN M B, GUSTAFSSON C, HUISMAN G W. Directed evolution: the 'rational' basis for 'irrational' design[J]. Current Opinion in Structural Biology, 2000, 10(4): 421-427. |
5 | CHEN K, ARNOLD F H. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide[J]. Proceedings of the National Academy of Sciences of the United States of America, 1993, 90(12): 5618-5622. |
6 | STEMMER W P C. Rapid evolution of a protein in vitro by DNA shuffling[J]. Nature, 1994, 370(6488): 389-391. |
7 | STEMMER W P. DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution[J]. Proceedings of the National Academy of Sciences of the United States of America, 1994, 91(22): 10747-10751. |
8 | LIEBETON K, ZONTA A, SCHIMOSSEK K, et al. Directed evolution of an enantioselective lipase[J]. Chemistry & Biology, 2000, 7(9): 709-718. |
9 | REETZ M T, ZONTA A, SCHIMOSSEK K, et al. Creation of enantioselective biocatalysts for organic chemistry by in vitro evolution[J]. Angewandte Chemie International Edition, 1997, 36(24): 2830-2832. |
10 | POREBSKI B T, BUCKLE A M. Consensus protein design[J]. Protein Engineering, Design and Selection, 2016, 29(7): 245-251. |
11 | STERNKE M, TRIPP K W, BARRICK D. Consensus sequence design as a general strategy to create hyperstable, biologically active proteins[J]. Proceedings of the National Academy of Sciences of the United States of America, 2019, 116(23): 11275-11284. |
12 | PALMER B, ANGUS K, TAYLOR L, et al. Design of stability at extreme alkaline pH in streptococcal protein G[J]. Journal of Biotechnology, 2008, 134(3/4): 222-230. |
13 | MINAKUCHI K, MURATA D, OKUBO Y, et al. Remarkable alkaline stability of an engineered protein A as immunoglobulin affinity ligand: C domain having only one amino acid substitution[J]. Protein Science, 2013, 22(9): 1230-1238. |
14 | ROMERO-RIVERA A, GARCIA-BORRÀS M, OSUNA S. Computational tools for the evaluation of laboratory-engineered biocatalysts[J]. Chemical Communications, 2017, 53(2): 284-297. |
15 | KHERSONSKY O, LIPSH R, AVIZEMER Z, et al. Automated design of efficient and functionally diverse enzyme repertoires[J]. Molecular Cell, 2018, 72(1): 178-186.e5. |
16 | WEINREICH D M, DELANEY N F, DEPRISTO M A, et al. Darwinian evolution can follow only very few mutational paths to fitter proteins[J]. Science, 2006, 312(5770): 111-114. |
17 | LI R F, WIJMA H J, SONG L, et al. Computational redesign of enzymes for regio- and enantioselective hydroamination[J]. Nature Chemical Biology, 2018, 14(7): 664-670. |
18 | CUI Y L, WANG Y H, TIAN W Y, et al. Development of a versatile and efficient C-N lyase platform for asymmetric hydroamination via computational enzyme redesign[J]. Nature Catalysis, 2021, 4(5): 364-373. |
19 | CAPRIOTTI E, FARISELLI P, CASADIO R. A neural-network-based method for predicting protein stability changes upon single point mutations[J]. Bioinformatics, 2004, 20(S1): i63-i68. |
20 | CAPRIOTTI E, FARISELLI P, CASADIO R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure[J]. Nucleic Acids Research, 2005, 33(S2): W306-W310. |
21 | 曲玉辰, 陆路, 姜世勃. 利用I-Mutant 2.0辅助设计与优化中东呼吸综合征冠状病毒融合抑制多肽[J]. 微生物与感染, 2019, 14(2): 72-81. |
QU Y C, LU L, JIANG S B. Using I-Mutant2.0 to assist the design and optimization of MERS-CoV fusion inhibitory peptides[J]. Journal of Microbes and Infections, 2019, 14(2): 72-81. | |
22 | YANG Y, DING X S, ZHU G C, et al. ProTstab - predictor for cellular protein stability[J]. BMC Genomics, 2019, 20(1): 804. |
23 | FARISELLI P, MARTELLI P L, SAVOJARDO C, et al. INPS: predicting the impact of non-synonymous variations on protein stability from sequence[J]. Bioinformatics, 2015, 31(17): 2816-2821. |
24 | LAIMER J, HOFER H, FRITZ M, et al. MAESTRO—multi agent stability prediction upon point mutations[J]. BMC Bioinformatics, 2015, 16: 116. |
25 | DEHOUCK Y, KWASIGROCH J M, GILIS D, et al. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality[J]. BMC Bioinformatics, 2011, 12: 151. |
26 | PIRES D E V, ASCHER D B, BLUNDELL T L. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach[J]. Nucleic Acids Research, 2014, 42(W1): W314-W319. |
27 | WORTH C L, PREISSNER R, BLUNDELL T L. SDM—a server for predicting effects of mutations on protein stability and malfunction. Nucleic acids research, 2011, 39(S2): W215-W222. |
28 | PIRES D E V, ASCHER D B, BLUNDELL T L. mCSM: predicting the effects of mutations in proteins using graph-based signatures[J]. Bioinformatics, 2014, 30(3): 335-342. |
29 | The UniProt Consortium. UniProt: the universal protein knowledgebase in 2023[J]. Nucleic Acids Research, 2023, 51(D1): D523-D531. |
30 | MEIER J, RAO R S, VERKUIL R, et al. Language models enable zero-shot prediction of the effects of mutations on protein function[C/OL]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021. 34: 29287-29303 [2023-01-03]. . |
31 | RAO R M, LIU J, VERKUIL R, et al. MSA transformer[C/OL]// Proceedings of the 38th International Conference on Machine Learning, PMLR,2021, 139:8844-8856 [2023-01-03]. . |
32 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
33 | HSU C, VERKUIL R, LIU J, et al. Learning inverse folding from millions of predicted structures[EB/OL]. bioRxiv, 2022[2023-01-03]. . |
34 | JING B, EISMANN S, SURIANA P, et al. Learning from protein structure with geometric vector perceptrons[EB/OL]. arXiv, 2020: 2009.01411[2023-01-03]. . |
35 | ZHOU B X, LV O T Y, YI K, et al. Lightweight equivariant graph representation learning for protein engineering[C/OL]//Machine Learning for Structural Biology Workshop - NeurIPS 2022[2023-01-03]. . |
36 | RIESSELMAN A J, INGRAHAM J B, MARKS D S. Deep generative models of genetic variation capture the effects of mutations[J]. Nature Methods, 2018, 15(10): 816-822. |
37 | NIJKAMP E, RUFFOLO J, WEINSTEIN E N, et al. ProGen2: exploring the boundaries of protein language models[EB/OL]. arXiv, 2022: 2206.13517[2023-01-03]. . |
38 | NOTIN P, DIAS M, FRAZER J, et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval[C/OL]// International Conference on Machine Learning, arXiv, 2022[2023-01-03]. . |
39 | LU H Y, DIAZ D J, CZARNECKI N J, et al. Machine learning-aided engineering of hydrolases for PET depolymerization[J]. Nature, 2022, 604(7907): 662-667. |
40 | RIVES A, MEIER J, SERCU T, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J]. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118. |
41 | LUO Y N, JIANG G D, YU T H, et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering[J]. Nature Communications, 2021, 12: 5743. |
42 | LI M C, KANG L Q, XIONG Y, et al. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering[EB/OL]. arXiv, 2022: 2301.00004[2023-01-03]. . |
43 | CUI Y L, CHEN Y C, LIU X Y, et al. Computational redesign of a PETase for plastic biodegradation under ambient condition by the GRAPE strategy[J]. ACS Catalysis, 2021, 11(3): 1340-1350. |
44 | ROCKLIN G J, CHIDYAUSIKU T M, GORESHNIK I, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing[J]. Science, 2017, 357(6347): 168-175. |
45 | HUANG B, XU Y, HU X H, et al. A backbone-centred energy function of neural networks for protein design[J]. Nature, 2022, 602(7897): 523-528. |
46 | DOU J Y, VOROBIEVA A A, SHEFFLER W, et al. De novo design of a fluorescence-activating β-barrel[J]. Nature, 2018, 561(7724): 485-491. |
47 | YEH A H W, NORN C, KIPNIS Y, et al. De novo design of luciferases using deep learning[J]. Nature, 2023, 614(7949): 774-780. |
48 | RUSS W P, FIGLIUZZI M, STOCKER C, et al. An evolution-based model for designing chorismate mutase enzymes[J]. Science, 2020, 369(6502): 440-445. |
49 | REPECKA D, JAUNISKIS V, KARPUS L, et al. Expanding functional protein sequence spaces using generative adversarial networks[J]. Nature Machine Intelligence, 2021, 3(4): 324-333. |
50 | MADANI A, KRAUSE B, GREENE E R, et al. Large language models generate functional protein sequences across diverse families[J/OL]. Nature Biotechnology, 2023[2023-02-01]. . |
51 | SINAI S, WANG R, WHATLEY A, et al. AdaLead: a simple and robust adaptive greedy search algorithm for sequence design[EB/OL]. arXiv, 2020: 2010.02141[2023-01-03]. . |
52 | BISWAS S, KHIMULYA G, ALLEY E C, et al. Low-N protein engineering with data-efficient deep learning[J]. Nature Methods, 2021, 18(4): 389-396. |
53 | HU R Y, FU L H, CHEN Y C, et al. Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments[J]. Briefings in Bioinformatics, 2023, 24(1): bbac570. |
54 | CASTRO E, GODAVARTHI A, RUBINFIEN J, et al. Transformer-based protein generation with regularized latent space optimization[J]. Nature Machine Intelligence, 2022, 4(10): 840-851. |
55 | LIN Z, AKIN H, RAO R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130. |
[1] | Mengchu SUN, Liangyu LU, Xiaolin SHEN, Xinxiao SUN, Jia WANG, Qipeng YUAN. Fluorescence detection-based high-throughput screening systems and devices facilitate cell factories construction [J]. Synthetic Biology Journal, 2023, 4(5): 947-965. |
[2] | Yang MING, Bin CHEN, Xiaoqiang HUANG. Recent advances in photoenzymatic synthesis [J]. Synthetic Biology Journal, 2023, 4(4): 651-675. |
[3] | Sheng WANG, Zechen WANG, Weihua CHEN, Ke CHEN, Xiangda PENG, Fafen OU, Liangzhen ZHENG, Jinyuan SUN, Tao SHEN, Guoping ZHAO. Design of synthetic biology components based on artificial intelligence and computational biology [J]. Synthetic Biology Journal, 2023, 4(3): 422-443. |
[4] | Qiaozhen MENG, Fei GUO. Applications of foldability in intelligent enzyme engineering and design: take AlphaFold2 for example [J]. Synthetic Biology Journal, 2023, 4(3): 571-589. |
[5] | Zhihang CHEN, Menglin JI, Yifei QI. Research progress of artificial intelligence in desiging protein structures [J]. Synthetic Biology Journal, 2023, 4(3): 464-487. |
[6] | Qilong LAI, Shuai YAO, Yuguo ZHA, Hong BAI, Kang NING. Microbiome-based biosynthetic gene cluster data mining techniques and application potentials [J]. Synthetic Biology Journal, 2023, 4(3): 611-627. |
[7] | Qingyun RUAN, Xin HUANG, Zijun MENG, Shu QUAN. Computational design and directed evolution strategies for optimizing protein stability [J]. Synthetic Biology Journal, 2023, 4(1): 5-29. |
[8] | Yanping QI, Jin ZHU, Kai ZHANG, Tong LIU, Yajie WANG. Recent development of directed evolution in protein engineering [J]. Synthetic Biology Journal, 2022, 3(6): 1081-1108. |
[9] | Xinyu CUI, Ranran WU, Yuanming WANG, Zhiguang ZHU. Construction and enhancement of enzymatic bioelectrocatalytic systems [J]. Synthetic Biology Journal, 2022, 3(5): 1006-1030. |
[10] | Yuqi TANG, Songtao YE, Jia LIU, Xin ZHANG. Molecular chaperones promote protein stability and evolution [J]. Synthetic Biology Journal, 2022, 3(3): 445-464. |
[11] | Jiahao BIAN, Guangyu YANG. Artificial intelligence-assisted protein engineering [J]. Synthetic Biology Journal, 2022, 3(3): 429-444. |
[12] | Yi-Heng ZHANG. Remembering Professor Daniel I.C. Wang’s contribution to biorefining and my perspective on the progress [J]. Synthetic Biology Journal, 2021, 2(4): 497-508. |
[13] | Ran SHI, Zhengqiang JIANG. Enzymatic synthesis of 2'-fucosyllactose: advances and perspectives [J]. Synthetic Biology Journal, 2020, 1(4): 481-494. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||