DeepSeek model analysis and its applications in AI-assistant protein engineering

LI Mingchen, ZHONG Bozitao, YU Yuanxi, JIANG Fan, ZHANG Liang, TAN Yang, YU Huiqun, FAN Guisheng, HONG Liang

Synthetic Biology Journal 2025, 6 (3): 636-650. DOI: 10.12211/2096-8280.2025-041

Abstract （318）

HTML （61）

PDF（pc）（1726KB）（201）

In early 2025, Hangzhou DeepSeek AI Foundation Technology Research Co., Ltd. released and open-sourced its independently developed DeepSeek-R1 conversational large language model. This model exhibits extremely low inference costs and outstanding chain-of-thought reasoning capabilities, performing comparably to, and in some tasks surpassing, proprietary models like GPT-4o and o1. This achievement has garnered significant international attention. Furthermore, DeepSeek’s excellent performance in Chinese conversations and its free-for-commercial-use strategy have ignited a wave of deployment and application within China, thereby promoting the widespread adoption and development of AI technology. This work systematically analyzes the architectural design, training methodology, and inference mechanisms of the DeepSeek model, exploring the transfer potential and application prospects of its core technologies in AI-assistant protein research. The DeepSeek model integrates several cutting-edge, independently innovated technologies, including a multi-head latent attention mechanism, mixture-of-experts (MoE) with load balancing, and low-precision training. These innovations have substantially reduced the training and inference costs for Transformer models. Although DeepSeek was originally designed for human language understanding and generation, its optimization techniques hold significant reference value for pre-trained language models with proteins, which are also based on the Transformer architecture. By leveraging the key technologies employed in DeepSeek, protein language models are expected to achieve substantial reductions in training and inference costs.

在计算机中，实数通常通过浮点数进行近似表示。常见的浮点数格式包括FP64、FP32、FP16［26］以及FP8［27］，其中数字表示该格式所占用的比特位数。例如，FP64和FP32分别占用64和32位的存储空间。通常情况下，位数越多，数值表示的精度越高，但带来的计算开销也越大，运算速度相应下降。FP64是当前具有较高数值精度的标准浮点格式。图4展示了使用FP16、FP8（E4M3版本）表示标准正态分布时相对FP64所产生的误差。可以看出，相比FP16，FP8的表示误差明显更大，精度显著下降，但是其计算成本相对较低。