合成生物学 ›› 2025, Vol. 6 ›› Issue (3): 617-635.DOI: 10.12211/2096-8280.2025-044
收稿日期:
2025-05-12
修回日期:
2025-06-06
出版日期:
2025-06-30
发布日期:
2025-06-27
通讯作者:
林一瀚
作者简介:
宋成治(1998—),男,博士研究生。研究方向包括系统生物学、合成生物学、生物物理。 E-mail:czsong@stu.pku.edu.cn
基金资助:
SONG Chengzhi1(), LIN Yihan1,2,3(
)
Received:
2025-05-12
Revised:
2025-06-06
Online:
2025-06-30
Published:
2025-06-27
Contact:
LIN Yihan
摘要:
定向进化是合成生物学领域的核心底层技术之一。通过在实验室中模拟自然界发生的进化过程,定向进化利用功能筛选从大量的突变序列文库中不断获得性能提升的蛋白序列,帮助实现野生型蛋白难以实现的功能。近年来不断发展的机器学习、蛋白语言模型等人工智能(artificial intelligence, AI)方法进一步拓展了该技术的使用场景和工作效率,帮助其在酶、抗体、生物传感器等的改造中取得优异表现。本文总结了传统定向进化在突变文库构建和功能筛选过程中使用的典型策略,并对近年来开发的高效连续定向进化平台进行介绍,进一步对定向进化技术存在的序列空间有限、容易陷入局部最优等一系列问题进行探讨。快速迭代的机器学习模型与定向进化相结合,一方面能够缓解序列空间的探索局限性,另一方面能够从起始序列设计、中间文库优化、功能信息提取等多个维度对定向进化的实验流程进行完善,帮助实现更加高效的蛋白改造尝试。为明确定向进化结合机器学习的应用潜力,本文重点展示了机器学习辅助定向进化的代表案例。最后,简要探讨了该领域的潜在挑战和未来发展方向。
中图分类号:
宋成治, 林一瀚. AI+定向进化赋能蛋白改造及优化[J]. 合成生物学, 2025, 6(3): 617-635.
SONG Chengzhi, LIN Yihan. AI-enabled directed evolution for protein engineering and optimization[J]. Synthetic Biology Journal, 2025, 6(3): 617-635.
图2 部分连续定向进化平台原理示意图[(a)~(d)分别为PACE[46]、VEGAS[52]、REPLACE[42]、OrthoRep系统[29],图中X用于表示在目标片段中出现的突变。]
Fig. 2 Illustration of continuous directed evolution platforms[Panels (a)-(d) correspond to PACE[46], VEGAS[52], REPLACE[42], and OrthoRep[29], respectively. The symbol X denotes mutation events.]
蛋白质语言模型 Protein language model | 年份 Year | 参数量 Parameters | 训练数据规模 Training data size | 架构 Architecture |
---|---|---|---|---|
ProtVec[ | 2015 | — | Swiss-Prot 0.55M sequences | Word2Vec |
SeqVec[ | 2019 | 93M | UniRef50 33M sequences | BiLSTM |
UniRep[ | 2019 | 18M | UniRef50 24M sequences | mLSTM |
TAPE (Transformer)[ | 2019 | 38M | Pfam 31M protein domains seqeunces | Encoder-only |
ProGen[ | 2020 | 1.2B | 280M sequences from 5 sources | Decoder-only |
ESM-1b[ | 2021 | 650M | UniRef50 27.1M sequence | Encoder-only |
ESM-1v[ | 2021 | 650M | UniRef90 98M sequence | Encoder-only |
MSA Transformer[ | 2021 | 100M | 26M multiple sequence alignments | Axial Transformer |
ProteinBERT[ | 2022 | 16M | UniRef90 106M sequences | Encoder-only |
ProtGPT2[ | 2022 | 738M | UniRef50 45M sequences | Decoder-only |
ProtT5 (ProtTrans)[ | 2022 | 3B(XL)/11B(XXL) | BFD + UniRef50 2.3B sequences | Encoder-Decoder |
ESM-2[ | 2023 | 8M~15B | UniRef50 60M sequences | Encoder-only |
ProGen2[ | 2023 | 151M~6.4B | UniRef90+BFD30 1B sequences | Decoder-only |
Ankh[ | 2023 | 450M(base);1.15B(large) | UniRef50 45M sequences | Encoder-Decoder |
PoET[ | 2023 | 604M | UniRef50 29M sets of homologous sequence | Decoder-only |
ESM3[ | 2024 | 1.4B~98B | 3.15B protein sequences, 236M protein structures, and 539M proteins’ function annotations | Encoder-only |
CARP[ | 2024 | 600k~640M | UniRef50 42M sequences | CNN |
ProLLaMA[ | 2024 | 7B | UniRef50 48M sequences | Decoder-only |
xTrimoPGLM[ | 2024 | 1B~100B | UniRef90 + ColabFoldDB 939M protein sequences | Transformer |
Prot42[ | 2025 | 500M(base);1.1B(large) | UniRef50 57M sequences | Decoder-only |
T5ProtChem[ | 2025 | 102M | UniRef50 52M protein sequences and PubChem 97M SMILES sequences | Encoder-Decoder |
LC-PLM[ | 2025 | 1.4B | UniRef50 63M sequences | BiMamba |
表1 蛋白质语言模型汇总
Table 1 Summary for protein language models
蛋白质语言模型 Protein language model | 年份 Year | 参数量 Parameters | 训练数据规模 Training data size | 架构 Architecture |
---|---|---|---|---|
ProtVec[ | 2015 | — | Swiss-Prot 0.55M sequences | Word2Vec |
SeqVec[ | 2019 | 93M | UniRef50 33M sequences | BiLSTM |
UniRep[ | 2019 | 18M | UniRef50 24M sequences | mLSTM |
TAPE (Transformer)[ | 2019 | 38M | Pfam 31M protein domains seqeunces | Encoder-only |
ProGen[ | 2020 | 1.2B | 280M sequences from 5 sources | Decoder-only |
ESM-1b[ | 2021 | 650M | UniRef50 27.1M sequence | Encoder-only |
ESM-1v[ | 2021 | 650M | UniRef90 98M sequence | Encoder-only |
MSA Transformer[ | 2021 | 100M | 26M multiple sequence alignments | Axial Transformer |
ProteinBERT[ | 2022 | 16M | UniRef90 106M sequences | Encoder-only |
ProtGPT2[ | 2022 | 738M | UniRef50 45M sequences | Decoder-only |
ProtT5 (ProtTrans)[ | 2022 | 3B(XL)/11B(XXL) | BFD + UniRef50 2.3B sequences | Encoder-Decoder |
ESM-2[ | 2023 | 8M~15B | UniRef50 60M sequences | Encoder-only |
ProGen2[ | 2023 | 151M~6.4B | UniRef90+BFD30 1B sequences | Decoder-only |
Ankh[ | 2023 | 450M(base);1.15B(large) | UniRef50 45M sequences | Encoder-Decoder |
PoET[ | 2023 | 604M | UniRef50 29M sets of homologous sequence | Decoder-only |
ESM3[ | 2024 | 1.4B~98B | 3.15B protein sequences, 236M protein structures, and 539M proteins’ function annotations | Encoder-only |
CARP[ | 2024 | 600k~640M | UniRef50 42M sequences | CNN |
ProLLaMA[ | 2024 | 7B | UniRef50 48M sequences | Decoder-only |
xTrimoPGLM[ | 2024 | 1B~100B | UniRef90 + ColabFoldDB 939M protein sequences | Transformer |
Prot42[ | 2025 | 500M(base);1.1B(large) | UniRef50 57M sequences | Decoder-only |
T5ProtChem[ | 2025 | 102M | UniRef50 52M protein sequences and PubChem 97M SMILES sequences | Encoder-Decoder |
LC-PLM[ | 2025 | 1.4B | UniRef50 63M sequences | BiMamba |
图3 机器学习辅助定向进化(MLDE)的一般流程和案例展示(a)MLDE的一般流程;(b)~(e)MLDE在酶[121]、抗体[125]和转录因子[59]优化改造中的应用。(e)中分别展示了根据DMS和EvoAI方法得到的打分前十的AmeR突变体的抑制倍数提升情况
Fig. 3 Workflow for machine learning-assisted directed evolution (MLDE) and case studies(a)MLDE workflow; (b)-(e) Applications of MLDE to the optimization of enzymes[121], antibodies[125], and transcription factors[59]. Panel (e) corresponds to fold-increase in inhibitory activity for the top-10 AmeR mutants, as ranked by models trained on DMS data or by EvoAI.
61 | WEINREICH D M, DELANEY N F, DEPRISTO M A, et al. Darwinian evolution can follow only very few mutational paths to fitter proteins[J]. Science, 2006, 312(5770): 111-114. |
62 | PODGORNAIA A I, LAUB M T. Pervasive degeneracy and epistasis in a protein-protein interface[J]. Science, 2015, 347(6222): 673-677. |
63 | FOX R, ROY A, GOVINDARAJAN S, et al. Optimizing the search algorithm for protein engineering by directed evolution[J]. Protein Engineering Design and Selection, 2003, 16(8): 589-597. |
64 | FOX R J, DAVIS S C, MUNDORFF E C, et al. Improving catalytic function by ProSAR-driven enzyme evolution[J]. Nature Biotechnology, 2007, 25(3): 338-344. |
65 | ROMERO P A, KRAUSE A, ARNOLD F H. Navigating the protein fitness landscape with Gaussian processes[J]. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110(3): E193-E201. |
66 | OTWINOWSKI J, PLOTKIN J B. Inferring fitness landscapes by regression produces biased estimates of epistasis[J]. Proceedings of the National Academy of Sciences of the United States of America, 2014, 111(22): E2301-E2309. |
67 | OFER D, BRANDES N, LINIAL M. The language of proteins: NLP, machine learning & protein sequences[J]. Computational and Structural Biotechnology Journal, 2021, 19: 1750-1758. |
68 | FERRUZ N, HÖCKER B. Controllable protein design with language models[J]. Nature Machine Intelligence, 2022, 4(6): 521-532. |
69 | ASGARI E, MOFRAD M R K. Continuous distributed representation of biological sequences for deep proteomics and genomics[J]. PLoS One, 2015, 10(11): e0141287. |
70 | HEINZINGER M, ELNAGGAR A, WANG Y, et al. Modeling aspects of the language of life through transfer-learning protein sequences[J]. BMC Bioinformatics, 2019, 20(1): 723. |
71 | ALLEY E C, KHIMULYA G, BISWAS S, et al. Unified rational protein engineering with sequence-based deep representation learning[J]. Nature Methods, 2019, 16(12): 1315-1322. |
72 | RAO R, BHATTACHARYA N, THOMAS N, et al. Evaluating protein transfer learning with TAPE[C]//Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, 32: 9689-9701 [2025-06-03]. . |
73 | MADANI A, KRAUSE B, GREENE E R, et al. Large language models generate functional protein sequences across diverse families[J]. Nature Biotechnology, 2023, 41(8): 1099-1106. |
74 | MADANI A, MCCANN B, NAIK N, et al. ProGen: language modeling for protein generation[EB/OL]. arXiv, 2020: 2004.03497. (2020-03-08)[2025-06-03]. . |
75 | RIVES A, MEIER J, SERCU T, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J]. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118. |
76 | MEIER J, RAO R, VERKUIL R, et al. Language models enable zero-shot prediction of the effects of mutations on protein function[C/OL]//Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021, 34: 29287-29303 [2025-06-03]. . |
77 | RAO R M, LIU J, VERKUIL R, et al. MSA transformer[C]//Proceedings of the 38th International Conference on Machine Learning, PMLR, 2021, 139: 8844-8856 [2025-06-03]. . |
78 | BRANDES N, OFER D, PELEG Y, et al. ProteinBERT: a universal deep-learning model of protein sequence and function[J]. Bioinformatics, 2022, 38(8): 2102-2110. |
79 | FERRUZ N, SCHMIDT S, HÖCKER B. ProtGPT2 is a deep unsupervised language model for protein design[J]. Nature Communications, 2022, 13: 4348. |
80 | ELNAGGAR A, HEINZINGER M, DALLAGO C, et al. ProtTrans: toward understanding the language of life through self-supervised learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 7112-7127. |
81 | LIN Z M, AKIN H, RAO R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130. |
82 | NIJKAMP E, RUFFOLO J A, WEINSTEIN E N, et al. ProGen2: exploring the boundaries of protein language models[J]. Cell Systems, 2023, 14(11): 968-978.e3. |
83 | ELNAGGAR A, ESSAM H, SALAH-ELDIN W, et al. Ankh: optimized protein language model unlocks general-purpose modelling[EB/OL]. arXiv, 2023: 2301.06568.(2023-01-16)[2025-06-03]. . |
84 | TRUONG T F JR, BEPLER T. PoET: a generative model of protein families as sequences-of-sequences[C/OL]//Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023: 2306.06156[2025-06-03]. . |
1 | LERNER S A, WU T T, LIN E C. Evolution of a catabolic pathway in bacteria[J]. Science, 1964, 146(3649): 1313-1315. |
2 | HALL B G. Experimental evolution of a new enzymatic function. Kinetic analysis of the ancestral (ebg o) and evolved (ebg +) enzymes[J]. Journal of Molecular Biology, 1976, 107(1): 71-84. |
3 | LENUG D W, CHEN E, GOEDDEL D V. A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction[J]. Technique JMCMB, 1989, 1: 11-15. |
4 | CHEN K Q, ARNOLD F H. Enzyme engineering for nonaqueous solvents: random mutagenesis to enhance activity of subtilisin E in polar organic media[J]. Nature Biotechnology, 1991, 9(11): 1073-1077. |
5 | KLENK C, SCRIVENS M, NIEDERER A, et al. A Vaccinia-based system for directed evolution of GPCRs in mammalian cells[J]. Nature Communications, 2023, 14: 1770. |
6 | HUFFMAN M A, FRYSZKOWSKA A, ALVIZO O, et al. Design of an in vitro biocatalytic cascade for the manufacture of islatravir[J]. Science, 2019, 366(6470): 1255-1259. |
7 | TOURNIER V, TOPHAM C M, GILLES A, et al. An engineered PET depolymerase to break down and recycle plastic bottles[J]. Nature, 2020, 580(7802): 216-219. |
8 | SARAI N S, FULTON T J, O’MEARA R L, et al. Directed evolution of enzymatic silicon-carbon bond cleavage in siloxanes[J]. Science, 2024, 383(6681): 438-443. |
9 | RAPPAZZO C G, TSE L V, KAKU C I, et al. Broad and potent activity against SARS-like viruses by an engineered human monoclonal antibody[J]. Science, 2021, 371(6531): 823-829. |
10 | BANACH B B, PLETNEV S, OLIA A S, et al. Antibody-directed evolution reveals a mechanism for enhanced neutralization at the HIV-1 fusion peptide site[J]. Nature Communications, 2023, 14: 7593. |
11 | TABEBORDBAR M, LAGERBORG K A, STANTON A, et al. Directed evolution of a family of AAV capsid variants enabling potent muscle-directed gene delivery across species[J]. Cell, 2021, 184(19): 4919-4938.e22. |
12 | LIN R, ZHOU Y T, YAN T, et al. Directed evolution of adeno-associated virus for efficient gene delivery to microglia[J]. Nature Methods, 2022, 19(8): 976-985. |
13 | CLARKE J, FERSHT A R. Engineered disulfide bonds as probes of the folding pathway of barnase: increasing the stability of proteins against the rate of denaturation[J]. Biochemistry, 1993, 32(16): 4322-4329. |
14 | REA V, KOLKMAN A J, VOTTERO E, et al. Active site substitution A82W improves the regioselectivity of steroid hydroxylation by cytochrome P450 BM3 mutants as rationalized by spin relaxation nuclear magnetic resonance studies[J]. Biochemistry, 2012, 51(3): 750-760. |
85 | HAYES T, RAO R, AKIN H, et al. Simulating 500 million years of evolution with a language model[J]. Science, 2025, 387(6736): 850-858. |
86 | YANG K K, FUSI N, LU A X. Convolutions are competitive with transformers for protein sequence pretraining[J]. Cell Systems, 2024, 15(3): 286-294.e2. |
87 | LV L, LIN Z Y, LI H, et al. ProLLaMA: a protein large language model for multi-task protein language processing[J]. IEEE Transactions on Artificial Intelligence, 2025, PP(99): 1-12. |
88 | CHEN B, CHENG X Y, LI P, et al. xTrimoPGLM: unified 100B-scale pre-trained transformer for deciphering the language of protein[EB/OL]. arXiv, 2024: 2401.06199. (2024-01-11)[2025-06-03]. . |
89 | SAYEED M A, TEKIN E, NADEEM M, et al. Prot42: a novel family of protein language models for target-aware protein binder generation[EB/OL]. arXiv, 2025: 2504.04453. (2025-04-06)[2025-06-03]. . |
90 | KELLY T, XIA S, LU J Y, et al. Unified deep learning of molecular and protein language representations with T5ProtChem[J]. Journal of Chemical Information and Modeling, 2025, 65(8): 3990-3998. |
91 | WANG Y H, WANG Z C, SADEH G, et al. LC-PLM: long-context protein language modeling using bidirectional mamba with shared projection layers[EB/OL]. arXiv, 2025::2411.08909. (2024-10-29)[2025-06-03]. . |
92 | BISWAS S, KHIMULYA G, ALLEY E C, et al. Low-N protein engineering with data-efficient deep learning[J]. Nature Methods, 2021, 18(4): 389-396. |
93 | SUZEK B E, WANG Y Q, HUANG H Z, et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches[J]. Bioinformatics, 2015, 31(6): 926-932. |
94 | RADFORD A, JOZEFOWICZ R, SUTSKEVER I. Learning to generate reviews and discovering sentiment[EB/OL]. arXiv, 2017: 1704.01444. (2017-04-05)[2025-06-03]. . |
95 | HSU C, VERKUIL R, LIU J, et al. Learning inverse folding from millions of predicted structures[C/OL]//Proceedings of the 39th International Conference on Machine Learning, PMLR, 2022, 162: 8946-8970 [2025-06-03]. . |
96 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
15 | MAITI A, BUFFALO C Z, SAURABH S, et al. Structural and photophysical characterization of the small ultra-red fluorescent protein[J]. Nature Communications, 2023, 14: 4155. |
16 | PACKER M S, LIU D R. Methods for the directed evolution of proteins[J]. Nature Reviews Genetics, 2015, 16(7): 379-394. |
17 | WANG Y J, XUE P, CAO M F, et al. Directed evolution: methodologies and applications[J]. Chemical Reviews, 2021, 121(20): 12384-12444. |
18 | ZACCOLO M, WILLIAMS D M, BROWN D M, et al. An approach to random mutagenesis of DNA using mixtures of triphosphate derivatives of nucleoside analogues[J]. Journal of Molecular Biology, 1996, 255(4): 589-603. |
19 | VANHERCKE T, AMPE C, TIRRY L, et al. Reducing mutational bias in random protein libraries[J]. Analytical Biochemistry, 2005, 339(1): 9-14. |
20 | DENNIG A, SHIVANGE A V, MARIENHAGEN J, et al. OmniChange: the sequence independent method for simultaneous site-saturation of five codons[J]. PLoS One, 2011, 6(10): e26222. |
21 | PÜLLMANN P, ULPINNIS C, MARILLONNET S, et al. Golden Mutagenesis: an efficient multi-site-saturation mutagenesis approach by Golden Gate cloning with automated primer design[J]. Scientific Reports, 2019, 9: 10932. |
22 | ZHAO H M, GIVER L, SHAO Z X, et al. Molecular evolution by staggered extension process (StEP) in vitro recombination[J]. Nature Biotechnology, 1998, 16(3): 258-261. |
23 | COCO W M, LEVINSON W E, CRIST M J, et al. DNA shuffling method for generating highly recombined genes and evolved enzymes[J]. Nature Biotechnology, 2001, 19(4): 354-359. |
24 | SIEBER V, MARTINEZ C A, ARNOLD F H. Libraries of hybrid proteins from distantly related sequences[J]. Nature Biotechnology, 2001, 19(5): 456-460. |
25 | BITTKER J A, LE B V, LIU J M, et al. Directed evolution of protein enzymes using nonhomologous random recombination[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(18): 7011-7016. |
26 | GREENER A, CALLAHAN M, JERPSETH B. An efficient random mutagenesis technique using an E.coli mutator strain[J]. Molecular Biotechnology, 1997, 7(2): 189-195. |
97 | NOTIN P, KOLLASCH A, RITTER D, et al. ProteinGym: large-scale benchmarks for protein fitness prediction and design[C/OL]//Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023: 64331-64379 [2025-06-03]. . |
98 | BADRAN A H, GUZOV V M, HUAI Q, et al. Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance[J]. Nature, 2016, 533(7601): 58-63. |
99 | GLÖGL M, KRISHNAKUMAR A, RAGOTTE R J, et al. Target-conditioned diffusion generates potent TNFR superfamily antagonists and agonists[J]. Science, 2024, 386(6726): 1154-1161. |
100 | VÁZQUEZ TORRES S, BENARD VALLE M, MACKESSY S P, et al. De novo designed proteins neutralize lethal snake venom toxins[J]. Nature, 2025, 639(8053): 225-231. |
101 | YEH A H W, NORN C, KIPNIS Y, et al. De novo design of luciferases using deep learning[J]. Nature, 2023, 614(7949): 774-780. |
102 | KIPNIS Y, CHAIB A O, VOROBIEVA A A, et al. Design and optimization of enzymatic activity in a de novo β-barrel scaffold[J]. Protein Science, 2022, 31(11): e4405. |
103 | DING K, CHIN M, ZHAO Y L, et al. Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering[J]. Nature Communications, 2024, 15: 6392. |
104 | FRAM B, SU Y, TRUEBRIDGE I, et al. Simultaneous enhancement of multiple functional properties using evolution-informed protein design[J]. Nature Communications, 2024, 15: 5141. |
105 | WU Z, JENNIFER KAN S B, LEWIS R D, et al. Machine learning-assisted directed protein evolution with combinatorial libraries[J]. Proceedings of the National Academy of Sciences of the United States of America, 2019, 116(18): 8852-8858. |
106 | WU N C, DAI L, OLSON C A, et al. Adaptation in protein fitness landscapes is facilitated by indirect paths[J]. eLife, 2016, 5: e16965. |
107 | CHU H Y, FONG J H C, THEAN D G L, et al. Accurate top protein variant discovery via low-N pick-and-validate machine learning[J]. Cell Systems, 2024, 15(2): 193-203.e6. |
108 | YANG J, LAL R G, BOWDEN J C, et al. Active learning-assisted directed evolution[J]. Nature Communications, 2025, 16: 714. |
27 | BADRAN A H, LIU D R. Development of potent in vivo mutagenesis plasmids with broad mutational spectra[J]. Nature Communications, 2015, 6: 8425. |
28 | MOORE C L, PAPA L J 3 RD, SHOULDERS M D. A processive protein Chimera introduces mutations across defined DNA regions in vivo [J]. Journal of the American Chemical Society, 2018, 140(37): 11560-11564. |
29 | RAVIKUMAR A, ARZUMANYAN G A, OBADI M K A, et al. Scalable, continuous evolution of genes at mutation rates above genomic error thresholds[J]. Cell, 2018, 175(7): 1946-1957.e13. |
30 | RAVIKUMAR A, ARRIETA A, LIU C C. An orthogonal DNA replication system in yeast[J]. Nature Chemical Biology, 2014, 10(3): 175-177. |
31 | HALPERIN S O, TOU C J, WONG E B, et al. CRISPR-guided DNA polymerases enable diversification of all nucleotides in a tunable window[J]. Nature, 2018, 560(7717): 248-252. |
32 | TOU C J, SCHAFFER D V, DUEBER J E. Targeted diversification in the S. cerevisiae genome with CRISPR-guided DNA polymerase I[J]. ACS Synthetic Biology, 2020, 9(7): 1911-1916. |
33 | ÁLVAREZ B, MENCÍA M, DE LORENZO V, et al. In vivo diversification of target genomic sites using processive base deaminase fusions blocked by dCas9[J]. Nature Communications, 2020, 11: 6436. |
34 | YI X, KHEY J, KAZLAUSKAS R J, et al. Plasmid hypermutation using a targeted artificial DNA replisome[J]. Science Advances, 2021, 7(29): eabg8712. |
35 | CROOK N, ABATEMARCO J, SUN J, et al. In vivo continuous evolution of genes and pathways in yeast[J]. Nature Communications, 2016, 7: 13051. |
36 | CARR P A, WANG H H, STERLING B, et al. Enhanced multiplex genome engineering through co-operative oligonucleotide co-selection[J]. Nucleic Acids Research, 2012, 40(17): e132. |
37 | LEWIS J C, ARNOLD F H. Catalysts on demand: selective oxidations by laboratory-evolved cytochrome P450 BM3[J]. Chimia, 2009, 63(6): 309. |
38 | COELHO P S, BRUSTAD E M, KANNAN A, et al. Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes[J]. Science, 2013, 339(6117): 307-310. |
109 | ZHOU Z Y, ZHANG L, YU Y X, et al. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning[J]. Nature Communications, 2024, 15: 5566. |
110 | WITTMANN B J, YUE Y S, ARNOLD F H. Informed training set design enables efficient machine learning-assisted directed protein evolution[J]. Cell Systems, 2021, 12(11): 1026-1045.e7. |
111 | THURONYI B W, KOBLAN L W, LEVY J M, et al. Continuous evolution of base editors with expanded target compatibility and improved activity[J]. Nature Biotechnology, 2019, 37(9): 1070-1079. |
112 | HU J H, MILLER S M, GEURTS M H, et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity[J]. Nature, 2018, 556(7699): 57-63. |
113 | JUDGE A, SANKARAN B, HU L Y, et al. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning[J]. Proceedings of the National Academy of Sciences of the United States of America, 2024, 121(12): e2313513121. |
114 | OLSON C A, WU N C, SUN R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain[J]. Current Biology, 2014, 24(22): 2643-2651. |
115 | LIU G, ZENG H Y, MUELLER J, et al. Antibody complementarity determining region design using high-capacity machine learning[J]. Bioinformatics, 2020, 36(7): 2126-2133. |
116 | FERNANDEZ-DE-COSSIO-DIAZ J, UGUZZONI G, PAGNANI A. Unsupervised inference of protein fitness landscape from deep mutational scan[J]. Molecular Biology and Evolution, 2021, 38(1): 318-328. |
117 | SESTA L, UGUZZONI G, FERNANDEZ-DE-COSSIO-DIAZ J, et al. AMaLa: analysis of directed evolution experiments via annealed mutational approximated landscape[J]. International Journal of Molecular Sciences, 2021, 22(20): 10908. |
118 | SHEN M W, ZHAO K T, LIU D R. Reconstruction of evolving gene variants and fitness from short sequencing reads[J]. Nature Chemical Biology, 2021, 17(11): 1188-1198. |
119 | ALVAREZ S, NARTEY C M, MERCADO N, et al. In vivo functional phenotypes from a computational epistatic model of evolution[J]. Proceedings of the National Academy of Sciences of the United States of America, 2024, 121(6): e2308895121. |
120 | DI BARI L, BISARDI M, COTOGNO S, et al. Emergent time scales of epistasis in protein evolution[J]. Proceedings of the National Academy of Sciences of the United States of America, 2024, 121(40): e2406807121. |
121 | JIANG K Y, YAN Z Q, DI BERNARDO M, et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro[J]. Science, 2025, 387(6732): eadr6006. |
122 | LANDWEHR G M, BOGART J W, MAGALHAES C, et al. Accelerated enzyme engineering by machine-learning guided cell-free expression[J]. Nature Communications, 2025, 16: 865. |
123 | JIANG F, LI M C, DONG J J, et al. A general temperature-guided language model to design proteins of enhanced stability and activity[J]. Science Advances, 2024, 10(48): eadr2641. |
124 | HIE B L, SHANKER V R, XU D, et al. Efficient evolution of human antibodies from general protein language models[J]. Nature Biotechnology, 2024, 42(2): 275-283. |
125 | SHANKER V R, BRUUN T U J, HIE B L, et al. Unsupervised evolution of protein and antibody complexes with a structure-informed language model[J]. Science, 2024, 385(6704): 46-53. |
126 | BEDBROOK C N, YANG K K, ROBINSON J E, et al. Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics[J]. Nature Methods, 2019, 16(11): 1176-1184. |
127 | UNGER E K, KELLER J P, ALTERMATT M, et al. Directed evolution of a selective and sensitive serotonin sensor via machine learning[J]. Cell, 2020, 183(7): 1986-2002.e26. |
128 | SAITO Y, OIKAWA M, NAKAZAWA H, et al. Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins[J]. ACS Synthetic Biology, 2018, 7(9): 2014-2022. |
129 | CHENG X Y, CHEN B, LI P, et al. Training compute-optimal protein language models[C/OL]//Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 2024, 37: 69386-69418 [2025-06-03]. . |
130 | LUO Y N, JIANG G D, YU T H, et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering[J]. Nature Communications, 2021, 12: 5743. |
131 | LI M C, KANG L Q, XIONG Y, et al. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering[J]. Journal of Cheminformatics, 2023, 15(1): 12. |
132 | DIECKHAUS H, KUHLMAN B. Protein stability models fail to capture epistatic interactions of double point mutations[EB/OL]. bioRxiv, 2024: 2024.08.20.608844. (2024-08-20)[2025-06-03]. . |
39 | CHEN H Q, LIU S, PADULA S, et al. Efficient, continuous mutagenesis in human cells using a pseudo-random DNA editor[J]. Nature Biotechnology, 2020, 38(2): 165-168. |
40 | STIFFLER M A, HEKSTRA D R, RANGANATHAN R. Evolvability as a function of purifying selection in TEM-1 β-lactamase[J]. Cell, 2015, 160(5): 882-892. |
41 | STARR T N, GREANEY A J, HILTON S K, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding[J]. Cell, 2020, 182(5): 1295-1310.e20. |
42 | MA L, LIN Y H. Orthogonal RNA replication enables directed evolution and Darwinian adaptation in mammalian cells[J]. Nature Chemical Biology, 2025, 21(3): 451-463. |
43 | COLLINS C H, LEADBETTER J R, ARNOLD F H. Dual selection enhances the signaling specificity of a variant of the quorum-sensing transcriptional activator LuxR[J]. Nature Biotechnology, 2006, 24(6): 708-712. |
44 | MORRISON M S, PODRACKY C J, LIU D R. The developing toolkit of continuous directed evolution[J]. Nature Chemical Biology, 2020, 16(6): 610-619. |
45 | MOLINA R S, RIX G, MENGISTE A A, et al. In vivo hypermutation and continuous evolution[J]. Nature Reviews Methods Primers, 2022, 2: 36. |
46 | ESVELT K M, CARLSON J C, LIU D R. A system for the continuous directed evolution of biomolecules[J]. Nature, 2011, 472(7344): 499-503. |
47 | PACKER M S, REES H A, LIU D R. Phage-assisted continuous evolution of proteases with altered substrate specificity[J]. Nature Communications, 2017, 8: 956. |
48 | BLUM T R, LIU H, PACKER M S, et al. Phage-assisted evolution of botulinum neurotoxin proteases with reprogrammed specificity[J]. Science, 2021, 371(6531): 803-810. |
49 | MILLER S M, WANG T N, RANDOLPH P B, et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs[J]. Nature Biotechnology, 2020, 38(4): 471-481. |
50 | RICHTER M F, ZHAO K T, ETON E, et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity[J]. Nature Biotechnology, 2020, 38(7): 883-891. |
133 | YU T H, BOOB A G, SINGH N, et al. In vitro continuous protein evolution empowered by machine learning and automation[J]. Cell Systems, 2023, 14(8): 633-644. |
134 | GELMAN S, JOHNSON B, FRESCHLIN C, et al. Biophysics-based protein language models for protein engineering[EB/OL]. bioRxiv, 2024: 2024.03.15.585128. (2024-03-15)[2025-06-03]. . |
135 | OLIVARES-GIL A, BARBERO-APARICIO J A, RODRÍGUEZ J J, et al. Semi-supervised prediction of protein fitness for data-driven protein engineering[J]. Journal of Cheminformatics, 2025, 17(1): 88. |
136 | VIG J, MADANI A, VARSHNEY L R, et al. BERTology meets biology: interpreting attention in protein language models[EB/OL]. arXiv, 2020: 2006.15222. (2020-06-26)[2025-06-03]. . |
137 | CHEN L, ZHANG Z H, LI Z H, et al. Learning protein fitness landscapes with deep mutational scanning data from multiple sources[J]. Cell Systems, 2023, 14(8): 706-721.e5. |
51 | MERCER J A M, DECARLO S J, ROY BURMAN S S, et al. Continuous evolution of compact protein degradation tags regulated by selective molecular glues[J]. Science, 2024, 383(6688): eadk4422. |
52 | ENGLISH J G, OLSEN R H J, LANSU K, et al. VEGAS as a platform for facile directed evolution in mammalian cells[J]. Cell, 2019, 178(3): 748-761.e17. |
53 | DENES C E, COLE A J, TRAN M T N, et al. The VEGAS platform is unsuitable for mammalian directed evolution[J]. ACS Synthetic Biology, 2022, 11(10): 3544-3549. |
54 | KIMMAN T G, SMIT E, KLEIN M R. Evidence-based biosafety: a review of the principles and effectiveness of microbiological containment measures[J]. Clinical Microbiology Reviews, 2008, 21(3): 403-425. |
55 | ARTIKA I M, MA’ROEF C N. Laboratory biosafety for handling emerging viruses[J]. Asian Pacific Journal of Tropical Biomedicine, 2017, 7(5): 483-491. |
56 | WELLNER A, MCMAHON C, GILMAN M S A, et al. Rapid generation of potent antibodies by autonomous hypermutation in yeast[J]. Nature Chemical Biology, 2021, 17(10): 1057-1064. |
57 | RIX G, WILLIAMS R L, HU V J, et al. Continuous evolution of user-defined genes at 1 million times the genomic mutation rate[J]. Science, 2024, 386(6722): eadm9073. |
58 | TIAN R Z, REHM F B H, CZERNECKI D, et al. Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli [J]. Science, 2024, 383(6681): 421-426. |
59 | MA Z Y, LI W J, SHEN Y H, et al. EvoAI enables extreme compression and reconstruction of the protein sequence space[J]. Nature Methods, 2025, 22(1): 102-112. |
60 | JOHNSTON K E, ALMHJELL P J, WATKINS-DULANEY E J, et al. A combinatorially complete epistatic fitness landscape in an enzyme active site[J]. Proceedings of the National Academy of Sciences of the United States of America, 2024, 121(32): e2400439121. |
[1] | 吴柯, 罗家豪, 李斐然. 机器学习驱动的基因组规模代谢模型构建与优化[J]. 合成生物学, 2025, 6(3): 566-584. |
[2] | 田晓军, 张日新. 合成基因回路面临的细胞“经济学窘境”[J]. 合成生物学, 2025, 6(3): 532-546. |
[3] | 章益蜻, 刘高雯. 合成生物学视角下的基因功能探索与酵母工程菌株文库构建[J]. 合成生物学, 2025, 6(3): 685-700. |
[4] | 黄怡, 司同, 陆安静. 生物制造标准体系建设的现状、问题与建议[J]. 合成生物学, 2025, 6(3): 701-714. |
[5] | 张梦瑶, 蔡鹏, 周雍进. 合成生物学助力萜类香精香料可持续生产[J]. 合成生物学, 2025, 6(2): 334-356. |
[6] | 张璐鸥, 徐丽, 胡晓旭, 杨滢. 合成生物学助力化妆品走进生物制造新时代[J]. 合成生物学, 2025, 6(2): 479-491. |
[7] | 伊进行, 唐宇琳, 李春雨, 吴鹤云, 马倩, 谢希贤. 氨基酸衍生物在化妆品中的应用及其生物合成研究进展[J]. 合成生物学, 2025, 6(2): 254-289. |
[8] | 韦灵珍, 王佳, 孙新晓, 袁其朋, 申晓林. 黄酮类化合物生物合成及其在化妆品中应用的研究[J]. 合成生物学, 2025, 6(2): 373-390. |
[9] | 肖森, 胡立涛, 石智诚, 王发银, 余思婷, 堵国成, 陈坚, 康振. 可控分子量透明质酸的生物合成研究进展[J]. 合成生物学, 2025, 6(2): 445-460. |
[10] | 王倩, 果士婷, 辛波, 钟成, 王钰. L-精氨酸的微生物合成研究进展[J]. 合成生物学, 2025, 6(2): 290-305. |
[11] | 左一萌, 张姣姣, 连佳长. 酿酒酵母使能技术在化妆品原料合成中的应用[J]. 合成生物学, 2025, 6(2): 233-253. |
[12] | 汤传根, 王璟, 张烁, 张昊宁, 康振. 功能肽合成和挖掘策略研究进展[J]. 合成生物学, 2025, 6(2): 461-478. |
[13] | 郭婷婷, 韩湘凝, 黄熙婷, 张婷婷, 孔健. 乳酸菌的合成生物学工具及在合成益肤因子中的应用[J]. 合成生物学, 2025, 6(2): 320-333. |
[14] | 张萍, 张维娇, 胥睿睿, 李江华, 陈坚, 康振. 防晒化合物类菌孢素氨基酸的生物合成[J]. 合成生物学, 2025, 6(2): 306-319. |
[15] | 黄姝涵, 马赫, 罗云孜. 生物合成红景天苷的研究进展[J]. 合成生物学, 2025, 6(2): 391-407. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||