合成生物学 ›› 2025, Vol. 6 ›› Issue (3): 636-650.DOI: 10.12211/2096-8280.2025-041

• 特约评述 • 上一篇    下一篇

DeepSeek模型分析及其在AI辅助蛋白质工程中的应用

李明辰1,2, 钟博子韬1, 余元玺1, 姜帆1, 张良1, 谭扬1,2, 虞慧群2, 范贵生2, 洪亮1   

  1. 1.上海交通大学张江高等研究院,上海 201203
    2.华东理工大学信息科学与工程学院,上海 200237
  • 收稿日期:2025-05-01 修回日期:2025-06-03 出版日期:2025-06-30 发布日期:2025-06-27
  • 通讯作者: 洪亮
  • 作者简介:李明辰(1998—),男,博士研究生。研究方向为人工智能。 E-mail:lmc@mail.ecust.edu.cn
    洪亮(1981—),男,教授,博士生导师。研究方向为分子生物物理,人工智能功能蛋白质设计以及药物分子设计。 E-mail:hongl3liang@sjtu.edu.cn
  • 基金资助:
    上海市2023年度“科技创新行动计划”计算生物学重点专项(23JS1400600)

DeepSeek model analysis and its applications in AI-assistant protein engineering

LI Mingchen1,2, ZHONG Bozitao1, YU Yuanxi1, JIANG Fan1, ZHANG Liang1, TAN Yang1,2, YU Huiqun2, FAN Guisheng2, HONG Liang1   

  1. 1.Zhangjiang Institute for Advanced Study,Shanghai Jiao Tong University,Shanghai 201203,China
    2.School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2025-05-01 Revised:2025-06-03 Online:2025-06-30 Published:2025-06-27
  • Contact: HONG Liang

摘要:

2025年年初,杭州深度求索人工智能基础技术研究有限公司发布并开源了其自主研发的DeepSeek-R1对话大模型。该模型具备极低的推理成本和出色的思维链推理能力,在多种任务上能够媲美甚至超越闭源的GPT-4o和o1模型,引发了国际社会的高度关注。此外,DeepSeek模型在中文对话上的优异表现以及免费商用的策略,在国内引发了部署和使用的热潮,推动了人工智能技术的普惠与发展。本文围绕DeepSeek模型的架构设计、训练方法与推理机制进行系统性分析,探讨其核心技术在AI蛋白质研究中的迁移潜力与应用前景。DeepSeek模型融合了多项自主创新的前沿技术,包括多头潜在注意力机制、混合专家网络及其负载均衡、低精度训练等,显著降低了Transformer模型的训练和推理成本。尽管DeepSeek模型原生设计用于人类语言的理解与生成,但其优化技术对同样基于Transformer模型的蛋白质预训练语言模型具有重要的参考价值。借助DeepSeek所采用的关键技术,蛋白质语言模型在训练成本、推理成本等方面有望得到显著降低。

关键词: 大语言模型, AI蛋白质, 深度自注意力变换网络, 蛋白质语言模型, 深度学习

Abstract:

In early 2025, Hangzhou DeepSeek AI Foundation Technology Research Co., Ltd. released and open-sourced its independently developed DeepSeek-R1 conversational large language model. This model exhibits extremely low inference costs and outstanding chain-of-thought reasoning capabilities, performing comparably to, and in some tasks surpassing, proprietary models like GPT-4o and o1. This achievement has garnered significant international attention. Furthermore, DeepSeek’s excellent performance in Chinese conversations and its free-for-commercial-use strategy have ignited a wave of deployment and application within China, thereby promoting the widespread adoption and development of AI technology. This work systematically analyzes the architectural design, training methodology, and inference mechanisms of the DeepSeek model, exploring the transfer potential and application prospects of its core technologies in AI-assistant protein research. The DeepSeek model integrates several cutting-edge, independently innovated technologies, including a multi-head latent attention mechanism, mixture-of-experts (MoE) with load balancing, and low-precision training. These innovations have substantially reduced the training and inference costs for Transformer models. Although DeepSeek was originally designed for human language understanding and generation, its optimization techniques hold significant reference value for pre-trained language models with proteins, which are also based on the Transformer architecture. By leveraging the key technologies employed in DeepSeek, protein language models are expected to achieve substantial reductions in training and inference costs.

Key words: large language models, AI-assistant protein, transformer, protein language model, deep learning

中图分类号: