Synthetic Biology Journal ›› 2025, Vol. 6 ›› Issue (3): 547-565.DOI: 10.12211/2096-8280.2025-016

• Invited Review • Previous Articles     Next Articles

Protein structural bioinformatics empowered by statistical physics and artificial intelligence

XIA Chenliang1, ZHANG Zecheng2, GUAN Xingyue3, TANG Qianyuan2   

  1. 1.Department of Mathematics and Physics,Sanjiang University,Nanjing 210012,Jiangsu,China
    2.Department of Physics,Hong Kong Baptist University,Hong Kong 999077,China
    3.School of Physics,Nanjing University,Nanjing 210093,Jiangsu,China
  • Received:2025-03-17 Revised:2025-04-15 Online:2025-06-27 Published:2025-06-30
  • Contact: TANG Qianyuan

统计物理与人工智能驱动的蛋白质结构生物信息学

夏辰亮1, 张泽成2, 管星悦3, 唐乾元2   

  1. 1.三江学院数理部,江苏 南京 210012
    2.香港浸会大学物理系,香港 999077
    3.南京大学物理学院,江苏 南京 210093
  • 通讯作者: 唐乾元
  • 作者简介:夏辰亮(1990—),男,博士,讲师。研究方向为蛋白质动力学的统计物理研究。 E-mail:xiacl1030@qq.com
    张泽成(1992—),男,博士研究生。研究方向为蛋白质序列、结构与动力学的统计、AI蛋白质结构预测和生物复杂性。E-mail:zhzece@outlook.com
    唐乾元(1989—),男,博士,助理教授。研究方向为数据驱动的生物复杂系统理论框架构建,通过深度融合机器学习、统计物理与高性能计算方法,研究包括蛋白质分子和大脑等不同时空尺度的生物复杂系统,揭示其内在的普适性组织原理与动力学规律。 E-mail:tangqy@hkbu.edu.hk
  • 基金资助:
    国家自然科学基金(12305052);江苏省高等学校自然科学研究项目(22KJD14005);香港研究资助局杰出青年学者计划(22302723);香港浸会大学资助项目(RC-FNRA-IG/22-23/SCI/03)

Abstract:

Structural bioinformatics focuses on the computational study of three-dimensional biomolecular structures and their functions, with protein structures as its core research object. Traditional research in this field relied on protein structure databases of experimentally determined proteins but was constrained by the high cost and low-throughput nature of experimental methods. The revolution in protein structure prediction driven by deep learning, particularly AlphaFold2’s breakthrough, has fundamentally transformed the field’s data landscape by achieving atomic-level prediction accuracy from amino acid sequences alone. The deep integration of statistical physics with big data analysis methodologies has enabled researchers to overcome limitations of traditional case-by-case studies, systematically revealing universal principles of protein design from massive datasets. The accumulation of extensive protein structure data provides a crucial foundation for quantifying long-range correlations in protein dynamics and their evolutionary correspondence, revealing universal principles rooted in the interplay between sequence variability, structural constraint, and functional optimization. These principles not only offer a unified framework for understanding protein structure, dynamics, function, and evolution but also serve as the basis for predictive models and de novo protein design in engineering applications. Building upon this foundation, statistical analyses based on the AlphaFold Database highlight the crucial role of data-driven methods in uncovering universal statistical laws and dimensionality reduction principles in protein evolution across increasing organismal complexity, offering fresh perspectives on the fundamental constraints and convergent patterns driving molecular evolution. Recognizing that protein functions often depends on transitions between multiple conformational states, precise prediction of protein dynamics has become a core research direction. These advances are propelling protein engineering into an era of precise rational design where researchers can predict and manipulate conformational change pathways to regulate enzyme activity, optimize ligand specificity, and design allosteric responses with unprecedented precision. The research paradigm combining statistical physics and artificial intelligence continues to drive innovation in protein science, enhancing high-throughput screening and rational design efficiency to accelerate translation from basic discoveries to practical applications. As computational capabilities advance and AI models evolve, the field progresses from single protein design toward complex biological system construction, opening new frontiers in synthetic biology, precision medicine, and other applications.

Key words: statistical physics, artificial intelligence, protein structure, protein dynamics, structural bioinformatics, AlphaFold database

摘要:

结构生物信息学聚焦于生物分子的三维结构及其功能,蛋白质的结构是其核心研究对象。深度学习引发的蛋白质结构预测革命,特别是AlphaFold2的突破,实现了仅凭氨基酸序列即可达到原子精度的蛋白质结构预测,从根本上重构了该领域的数据生态。统计物理学与大数据分析方法的深度融合,使研究者能够突破传统个案研究的局限,从海量数据中系统性揭示蛋白质设计的普适性规律。大规模蛋白质结构数据的积累为定量化研究蛋白质动力学中的长程关联及其与进化的对应关系奠定了重要基础,这不仅为理解蛋白质的结构、动力学、功能与进化提供了统一的理论框架,其揭示的普适规律与设计原则也为人工蛋白质设计提供了关键指导。在此基础上,基于AlphaFold数据库的跨物种蛋白质结构对比统计分析,突显了数据驱动方法在揭示蛋白质进化过程中随生物复杂性增加而呈现的普适统计规律方面的核心作用,为理解生命进化的分子机制提供了全新视角。鉴于蛋白质功能的实现往往依赖于多种构象状态间的动态转换,蛋白质动力学的精确预测已成为当前研究的核心方向。统计物理与人工智能相结合的研究范式将持续引领蛋白质科学的创新发展,通过提升高通量筛选和理性设计效率,加速从基础发现到实际应用的转化,为合成生物学、精准医学等领域开辟新的可能性。

关键词: 统计物理, 人工智能, 蛋白质结构, 蛋白质动力学, 结构生物信息学, AlphaFold数据库

CLC Number: