Synthetic Biology Journal ›› 2025, Vol. 6 ›› Issue (3): 617-635.DOI: 10.12211/2096-8280.2025-044

• Invited Review • Previous Articles     Next Articles

AI-enabled directed evolution for protein engineering and optimization

SONG Chengzhi1(), LIN Yihan1,2,3()   

  1. 1.Center for Quantitative Biology,Peking-Tsinghua Center for Life Sciences,Academy for Advanced Interdisciplinary Studies,Peking University,Beijing 100871,China
    2.The MOE Key Laboratory of Cell Proliferation and Differentiation,School of Life Sciences,Peking University,Beijing 100871,China
    3.Chengdu Academy for Advanced Interdisciplinary Biotechnologies,Peking University,Chengdu 610213,Sichuan,China
  • Received:2025-05-12 Revised:2025-06-06 Online:2025-06-27 Published:2025-06-30
  • Contact: LIN Yihan

AI+定向进化赋能蛋白改造及优化

宋成治1(), 林一瀚1,2,3()   

  1. 1.北京大学定量生物学中心,北京大学-清华大学生命科学联合中心,北京大学前沿学科交叉研究院,北京 100871
    2.北京大学教育部细胞增殖与分化重点实验室,北京大学生命科学学院,北京 100871
    3.北京大学成都前沿交叉生物技术研究院,四川 成都 610213
  • 通讯作者: 林一瀚
  • 作者简介:宋成治(1998—),男,博士研究生。研究方向包括系统生物学、合成生物学、生物物理。 E-mail:czsong@stu.pku.edu.cn
  • 基金资助:
    国家重点研发计划(2020YFA0906900)

Abstract:

Directed evolution is one of the core enabling technologies in synthetic biology. By recapitulating evolutionary processes that occur in nature within the laboratories, directed evolution employs functional screening to continually isolate variants with improved performance from large mutant libraries for functions that are difficult to achieve with wild-type proteins. In recent years, rapidly advancing artificial intelligence (AI) approaches—such as machine learning and protein language models—have further expanded both the range of applications and the operational efficiency of directed evolution, yielding unprecedented achievements in the engineering of enzymes, antibodies, biosensors, and more. In this review, we first outline classic strategies and emerging techniques for mutagenesis and functional selection in traditional directed evolution, followed by an in-depth examination of various continuous directed evolution systems. We highlight common limitations of directed evolution, emphasizing issues such as constrained search space and susceptibility to local optima. Combining rapidly iterated AI methods with directed evolution offers promising solutions to these challenges. Protein language models, in particular, leverage learned patterns from experimental variants alongside fundamental protein properties, providing superior predictive accuracy for unexplored mutants and facilitating the extrapolation of sequence-function relationships to broader sequence space. AI-based methods enhance directed evolution workflows from multiple perspectives. De novo protein design and unsupervised protein language models aid in generating functional starting sequences with targeted sequence diversity. Machine learning models trained on experimental data enable the construction of optimized mutant libraries tailored for subsequent selection rounds. Additionally, models derived from statistical physics and dynamical systems help extract detailed functional information from data acquired across multiple selection rounds. Collectively, these machine learning approaches significantly enhance the overall efficiency of directed evolution. To illustrate the transformative potential of machine learning-assisted directed evolution, we discuss exemplary cases of protein function improvement and modification. Lastly, we briefly address ongoing challenges and future directions in this rapidly evolving and promising research area.

Key words: directed evolution, machine learning, protein engineering, protein language model, synthetic biology

摘要:

定向进化是合成生物学领域的核心底层技术之一。通过在实验室中模拟自然界发生的进化过程,定向进化利用功能筛选从大量的突变序列文库中不断获得性能提升的蛋白序列,帮助实现野生型蛋白难以实现的功能。近年来不断发展的机器学习、蛋白语言模型等人工智能(artificial intelligence, AI)方法进一步拓展了该技术的使用场景和工作效率,帮助其在酶、抗体、生物传感器等的改造中取得优异表现。本文总结了传统定向进化在突变文库构建和功能筛选过程中使用的典型策略,并对近年来开发的高效连续定向进化平台进行介绍,进一步对定向进化技术存在的序列空间有限、容易陷入局部最优等一系列问题进行探讨。快速迭代的机器学习模型与定向进化相结合,一方面能够缓解序列空间的探索局限性,另一方面能够从起始序列设计、中间文库优化、功能信息提取等多个维度对定向进化的实验流程进行完善,帮助实现更加高效的蛋白改造尝试。为明确定向进化结合机器学习的应用潜力,本文重点展示了机器学习辅助定向进化的代表案例。最后,简要探讨了该领域的潜在挑战和未来发展方向。

关键词: 定向进化, 机器学习, 蛋白改造, 蛋白语言模型, 合成生物学

CLC Number: