蛋白质计算设计：方法和应用展望

doi:10.12211/2096-8280.2020-067

摘要/Abstract

摘要：

蛋白质计算设计是指通过计算理性地确定蛋白质的氨基酸序列，实现预设的结构和功能。蛋白质计算设计已逐渐形成了一套系统的方法，得到越来越多的实验验证。这些方法既可用于从头设计蛋白，也可以用于既有蛋白的理性改造，具有广泛应用前景，是合成生物学的重要使能技术之一。本文简要回顾蛋白质计算设计方法的历史，并从蛋白质能量计算方法、氨基酸序列自动优化、从头设计主链结构、设计新的分子间识别界面以及负设计等方面介绍蛋白质计算设计的基本方法和思路，还举例讨论了提高结构稳定性、构造新的分子界面等设计方法在酶、疫苗、自组装蛋白质材料等领域的应用，最后分析了蛋白质计算设计方法设计精度不足、难刻画极性相互作用的缺点以及需要考虑非水溶剂环境、界面设计优化等亟待解决的问题，展望了蛋白质计算设计方法未来在合成生物学领域如生物感受器、逻辑门设计等，医学领域如抗体、疫苗设计等的应用前景。

关键词: 蛋白质, 计算设计方法, 氨基酸序列, 多肽主链结构, 分子间识别界面

Abstract:

In computational protein design， the amino acid sequence of a protein is rationally chosen through computations so that the resulting molecule is of desired structure and function. Systematic methods for computational protein design have been developed and validated in increasing number of experiments. Exhibiting strong potential for broad applications， computational protein design has been considered as an important enabling technology for Synthetic Biology. Here we briefly review the history of methods for computational design， which are divided into three sections about heuristic design that based on rules， automatic optimization of amino acid sequences， and de novo main chain design respectively. In the next chapter， we introduce the basic approaches and strategies in details. In proteins energy calculation methods， we introduce physical energy terms and statistical energy terms. Based on these energy calculation methods， we introduce sequence and structure design methods including automated optimization of amino acid sequences， de novo design of polypeptide backbones （with fragment assembling method or sequence independent backbone potentials）， designing new interfaces for inter-molecule recognition such as protein-ligand interfaces and protein-protein interfaces， and the concept of negative design. Besides the history and detail of computational protein design methods that mentioned above， we also briefly discuss examples of using computational protein design to support application studies， including enhancing protein structural stability and redesign or de novo design of enzymes， vaccines and protein materials that related to interfaces design. These examples not only present current studies using the computational protein design methods， but also enlighten us on more broader applications in the future. Finally， we analyze some problems that need to be solved in the protein computational design method， such as inefficient in design accuracy， difficulty in characterizing polar interactions， and the need to consider the environment of non-aqueous solvents. We also discuss some aspects of possible application in synthetic biology like biological logic gates design and biosensor design， and application prospects in the medical field such as antibodies， vaccine design， etc.

Key words: protein, methods for computational design, amino acid sequences, polypeptide backbone structure, interface of intermolecular recognition

中图分类号:

Q816

操帆, 陈耀晞, 缪阳洋, 张璐, 刘海燕. 蛋白质计算设计：方法和应用展望[J]. 合成生物学, 2021, 2(1): 15-32.

CAO Fan, CHEN Yaoxi, MIAO Yangyang, ZHANG Lu, LIU Haiyan. Computational protein design: perspectives in methods and applications[J]. Synthetic Biology Journal, 2021, 2(1): 15-32.

图/表 7

图1 形成规则空间结构的多肽链的氨基酸序列变化规律示例

Fig. 1 Examples of changes in the amino acids sequence of a polypeptide chain forming a regular spatial structure（Hydrophilic and hydrophobic amino acids are alternated in a periodic pattern）

图2 给定主链优化氨基酸序列和侧链构象

Fig. 2 Optimization of amino acids sequences and side-chain conformations for a given backbone（For the input target backbone structures, the sequences with the lowest energies were found by searching the sequence andside chain conformational space, considering them the most likely to form the target structures）

图3 物理能量项

Fig. 3 Physical energy terms(Physical energy functions are generally constructed from the addition of covalent interaction terms as well as non covalent interaction terms)

图4 不同类型的统计能量项

Fig. 4 Statistical energy terms of various types(Different statistical energy functions are obtained by transforming the probability distributions obtained from statistical analysis of different kinds of data)

图5 两种主链设计策略

Fig. 5 Two backbone design strategies（Up， Splicing with the native fragment into a new backbone. Down， Main chain design methods for optimizing statistical energy functions）

图6 蛋白质-蛋白质界面设计的基本步骤

Fig. 6 Basic steps of protein-protein interface design[The backbone conformation of the ligand protein (red) in complex with the target receptor (green) is first designed,then the residue types at the ligand protein interface are designed and optimized, resulting in the final design result (blue)]

图7 正设计与负设计

Fig. 7 Positive design versus negative design（Positive design only considers decreasing target state energy and does not consider other states. Negative design then needs to raise the energy of the other states so that their energy differences from the target state increase）

参考文献 85

1	KUHLMAN B, BRADLEY P. Advances in protein structure prediction and design[J]. Nature Reviews Molecular Cell Biology, 2019, 20(11): 681-697.
2	HUANG P S, BOYKEN S E, BAKER D. The coming of age of de novo protein design[J]. Nature, 2016, 537(7620): 320-327.
3	MU Q, CUI Y, TIAN Y, et al. Thermostability improvement of the glucose oxidase from Aspergillus niger for efficient gluconic acid production via computational design[J]. International Journal of Biological Macromolecules, 2019, 136: 1060-1068.
4	LI R, WIJMA H J, SONG L, et al. Computational redesign of enzymes for regio- and enantioselective hydroamination[J]. Nature Chemical Biology, 2018, 14(7): 664-670.
5	ZHAN J, DING B, MA R, et al. Develop reusable and combinable designs for transcriptional logic gates[J]. Molecular Systems Biology, 2010, 6: 388.
6	PACKER M S, LIU D R. Methods for the directed evolution of proteins[J]. Nature Reviews Genetics, 2015, 16(7): 379-394.
7	LIU Y, YAN Z, LU X, et al. Improving the catalytic activity of isopentenyl phosphate kinase through protein coevolution analysis[J]. Scientific Reports, 2016, 6: 24117.
8	COLUZZA I. Computational protein design: a review[J]. Journal of Physics-Condensed Matter, 2017, 29(14): 143001.
9	KISS G, CELEBI-OLCUM N, MORETTI R, et al. Computational enzyme design[J]. Angewandte Chemie International Edition, 2013, 52(22): 5700-5725.
10	GOLDENZWEIG A, FLEISHMAN S J. Principles of protein stability and their application in computational design[J]. Annual Review of Biochemistry, 2018, 87: 105-129.
11	BARAN D, PSZOLLA M G, LAPIDOTH G D, et al. Principles for computational design of binding antibodies[J]. Proceedings of the National Academy of Sciences of the United States of America, 2017, 114(41): 10900-10905.
12	SUN M G, SEO M H, NIM S, et al. Protein engineering by highly parallel screening of computationally designed variants[J]. Science Advances, 2016, 2(7): e1600692.
13	KORENDOVYCH I V, DEGRADO W F. De novo protein design, a retrospective [J]. Quarterly Reviews of Biophysics, 2020, 53:e3.
14	LUPAS A N, BASSLER J. Coiled coils - a model system for the 21st century[J]. Trends in Biochemical Sciences, 2017, 42(2): 130-140.
15	HARBURY P B, PLECS J J, TIDOR B, et al. High-resolution protein design with backbone freedom[J]. Science, 1998, 282(5393): 1462-1467.
16	HUANG P S, OBERDORFER G, XU C, et al. High thermodynamic stability of parametrically designed helical bundles[J]. Science, 2014, 346(6208): 481-485.
17	MURPHY G S, SATHYAMOORTHY B, DER B S, et al. Computational de novo design of a four-helix bundle protein-DND_4HB[J]. Protein Science, 2015, 24(4): 434-445.
18	JOH N H, WANG T, BHATE M P, et al. De novo design of a transmembrane Zn(2)(+)-transporting four-helix bundle[J]. Science, 2014, 346(6216): 1520-1524.
19	LIANG H, CHEN H, FAN K, et al. De novo design of a beta alpha beta motif[J]. Angewandte Chemie International Edition, 2009, 48(18): 3301-3303.
20	GRIGORYAN G, DE GRADO W F. Probing designability via a generalized model of helical bundle geometry[J]. Journal of Molecular Biology, 2011, 405(4): 1079-1100.
21	DAHIYAT B I, SARISKY C A, MAYO S L. De novo protein design: towards fully automated sequence selection[J]. Journal of Molecular Biology, 1997, 273(4): 789-796.
22	LAZARIDIS T, KARPLUS M. Effective energy functions for protein structure prediction[J]. Current Opinion in Structural Biology, 2000, 10(2): 139-145.
23	PARK H, BRADLEY P, GREISEN P, JR., et al. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules[J]. Journal of Chemical Theory and Computation, 2016, 12(12): 6201-6212.
24	HUANG J, RAUSCHER S, NAWROCKI G, et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins[J]. Nature Methods, 2017, 14(1): 71-73.
25	ALFORD R F, LEAVER-FAY A, JELIAZKOV J R, et al. The rosetta all-atom energy function for macromolecular modeling and design[J]. Journal of Chemical Theory and Computation, 2017, 13(6): 3031-3048.
26	BOAS F E, HARBURY P B. Potential energy functions for protein design[J]. Current Opinion in Structural Biology, 2007, 17(2): 199-204.
27	XIONG P, WANG M, ZHOU X Q, et al. Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability[J]. Nature Communications, 2014, 5: 5330.
28	XIONG P, HU X H, HUANG B, et al. Increasing the efficiency and accuracy of the ABACUS protein sequence design method[J]. Bioinformatics, 2020, 36(1): 136-144.
29	KUHLMAN B, DANTAS G, IRETON G C, et al. Design of a novel globular protein fold with atomic-level accuracy[J]. Science, 2003, 302(5649): 1364-1368.
30	FRIEDLAND G D, KORTEMME T. Designing ensembles in conformational and sequence space to characterize and engineer proteins[J]. Current Opinion in Structural Biology, 2010, 20(3): 377-384.
31	DAVIS I W, ARENDALL W B, 3RD, RICHARDSON D C, et al. The backrub motion: how protein backbone shrugs when a sidechain dances[J]. Structure, 2006, 14(2): 265-274.
32	ZHOU X, XIONG P, WANG M, et al. Proteins of well-defined structures can be designed without backbone readjustment by a statistical model[J]. Journal of Structural Biology, 2016, 196(3): 350-357.
33	KOGA N, TATSUMI-KOGA R, LIU G, et al. Principles for designing ideal protein structures[J]. Nature, 2012, 491(7423): 222-227.
34	CHU H Y, LIU H Y. TetraBASE: a side chain-independent statistical energy for designing realistically packed protein backbones[J]. Journal of Chemical Information and Modeling, 2018, 58(2): 430-442.
35	FRAPPIER V, JENSON J M, ZHOU J, et al. Tertiary structural motif sequence statistics enable facile prediction and design of peptides that bind anti-apoptotic Bfl-1 and Mcl-1[J]. Structure, 2019, 27(4): 606-617, e5.
36	MACKENZIE C O, ZHOU J, GRIGORYAN G. Tertiary alphabet for the observable protein structural universe[J]. Proceedings of the National Academy of Sciences of the United States of America, 2016, 113(47): E7438-E7447.
37	OLLIKAINEN N, DE JONG R M, KORTEMME T. Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity[J]. PLoS Computational Biology, 2015, 11(9): e1004335.
38	HALLEN M A, DONALD B R. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions[J]. Bioinformatics, 2017, 33(14): I5-I12.
39	ROUX B, SIMONSON T. Implicit solvent models[J]. Biophysical Chemistry, 1999, 78(1/2): 1-20.
40	LAZARIDIS T, KARPLUS M. Effective energy function for proteins in solution[J]. Proteins, 1999, 35(2): 133-152.
41	GAINZA P, NISONOFF H M, DONALD B R. Algorithms for protein design[J]. Current Opinion in Structural Biology, 2016, 39: 16-26.
42	NEGRON C, KEATING A E. Multistate protein design using CLEVER and CLASSY[J]. Methods in Protein Design, 2013, 523: 171-190.
43	GRIGORYAN G, ZHOU F, LUSTIG S R, et al. Ultra-fast evaluation of protein energies directly from sequence[J]. PLoS Computational Biology, 2006, 2(6): 551-563.
44	TRAORE S, ROBERTS K E, ALLOUCHE D, et al. Fast search algorithms for computational protein design[J]. Journal of Computational Chemistry, 2016, 37(12): 1048-1058.
45	SMITH C A, KORTEMME T. Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction[J]. Journal of Molecular Biology, 2008, 380(4): 742-756.
46	LANOUETTE S, DAVEY J A, ELISMA F, et al. Discovery of substrates for a SET domain lysine methyltransferase predicted by multistate computational protein design[J]. Structure, 2015, 23(1): 206-215.
47	HILPERT K, WINKLER D F, HANCOCK R E. Peptide arrays on cellulose support: SPOT synthesis, a time and cost efficient method for synthesis of large numbers of peptides in a parallel and addressable fashion[J]. Nature Protocols, 2007, 2(6): 1333-1349.
48	MACKENZIE C O, GRIGORYAN G. Protein structural motifs in prediction and design[J]. Current Opinion in Structural Biology, 2017, 44: 161-167.
49	MRAVIC M, THOMASTON J L, TUCKER M, et al. Packing of apolar side chains enables accurate design of highly stable membrane proteins[J]. Science, 2019, 363(6434): 1418-1423.
50	LU P, MIN D, DIMAIO F, et al. Accurate computational design of multipass transmembrane proteins[J]. Science, 2018, 359(6379): 1042-1046.
51	THOMSON A R, WOOD C W, BURTON A J, et al. Computational design of water-soluble alpha-helical barrels[J]. Science, 2014, 346(6208): 485-488.
52	MACDONALD J T, MAKSIMIAK K, SADOWSKI M I, et al. De novo backbone scaffolds for protein design[J]. Proteins, 2010, 78(5): 1311-1325.
53	KARANICOLAS J, CORN J E, CHEN I, et al. A de novo protein binding pair by computational design and directed evolution[J]. Molecular Cell, 2011, 42(2): 250-260.
54	ZANGHELLINI A, JIANG L, WOLLACOTT A M, et al. New algorithms and an in silico benchmark for computational enzyme design[J]. Protein Science, 2006, 15(12): 2785-2794.
55	BOYKEN S E, CHEN Z, GROVES B, et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity[J]. Science, 2016, 352(6286): 680-687.
56	MAGUIRE J B, BOYKEN S E, BAKER D, et al. Rapid sampling of hydrogen bond networks for computational protein design[J]. Journal of Chemical Theory and Computation, 2018, 14(5): 2751-2760.
57	RICHARDSON J S, RICHARDSON D C. Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(5): 2754-2759.
58	LOFFLER P, SCHMITZ S, HUPFELD E, et al. Rosetta:MSF: a modular framework for multi-state computational protein design[J]. PLoS Computational Biology, 2017, 13(6): e1005600.
59	WIJMA H J, FLOOR R J, JEKEL P A, et al. Computationally designed libraries for rapid enzyme stabilization[J]. Protein Engineering Design & Selection, 2014, 27(2): 49-58.
60	CORREIA B E, BATES J T, LOOMIS R J, et al. Proof of principle for epitope-focused vaccine design[J]. Nature, 2014, 507(7491): 201-206.
61	MARCANDALLI J, FIALA B, OLS S, et al. Induction of potent neutralizing antibody responses by a designed protein nanoparticle vaccine for respiratory syncytial virus[J]. Cell, 2019, 176(6): 1420-1431, e17 .
62	SESTERHENN F, YANG C, BONET J, et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies[J]. Science, 2020, 368(6492): eaay5051.
63	SLOVIC A M, KONO H, LEAR J D, et al. Computational design of water-soluble analogues of the potassium channel KcsA[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(7): 1828-1833.
64	WANNIER T M, MOORE M M, MOU Y, et al. Computational design of the beta-sheet surface of a red fluorescent protein allows control of protein oligomerization[J]. PLoS One, 2015, 10(6): e0130582.
65	BANDA-VAZQUEZ J, SHANMUGARATNAM S, RODRIGUEZ-SOTRES R, et al. Redesign of LAOBP to bind novel l-amino acid ligands[J]. Protein Science, 2018, 27(5): 957-968.
66	GLASGOW A A, HUANG Y M, MANDELL D J, et al. Computational design of a modular protein sense-response system[J]. Science, 2019, 366(6468): 1024-1028.
67	POLIZZI N F, WU Y, LEMMIN T, et al. De novo design of a hyperstable non-natural protein-ligand complex with sub-A accuracy[J]. Nature Chemistry, 2017, 9(12): 1157-1164.
68	DOU J, VOROBIEVA A A, SHEFFLER W, et al. De novo design of a fluorescence-activating beta-barrel[J]. Nature, 2018, 561(7724): 485-491.
69	LEAVER-FAY A, FRONING K J, ATWELL S, et al. Computationally designed bispecific antibodies using negative state repertoires[J]. Structure, 2016, 24(4): 641-651.
70	FRONING K J, LEAVER-FAY A, WU X, et al. Computational design of a specific heavy chain/kappa light chain interface for expressing fully IgG bispecific antibodies[J]. Protein Science, 2017, 26(10): 2021-2038.
71	SILVA D A, YU S, ULGE U Y, et al. De novo design of potent and selective mimics of IL-2 and IL-15[J]. Nature, 2019, 565(7738): 186-191.
72	CHEN Z, BOYKEN S E, JIA M, et al. Programmable design of orthogonal protein heterodimers[J]. Nature, 2019, 565(7737): 106-111.
73	CHEN Z, KIBLER R D, HUNT A, et al. De novo design of protein logic gates[J]. Science, 2020, 368(6486): 78-84.
74	LANGAN R A, BOYKEN S E, NG A H, et al. De novo design of bioactive protein switches[J]. Nature, 2019, 572(7768): 205-210.
75	NG A H, NGUYEN T H, GOMEZ-SCHIAVON M, et al. Modular and tunable biological feedback control using a de novo protein switch[J]. Nature, 2019, 572(7768): 265-269.
76	SHEN H, FALLAS J A, LYNCH E, et al. De novo design of self-assembling helical protein filaments[J]. Science, 2018, 362(6415): 705-709.
77	KING N P, BALE J B, SHEFFLER W, et al. Accurate design of co-assembling multi-component protein nanomaterials[J]. Nature, 2014, 510(7503): 103-108.
78	FALLAS J A, UEDA G, SHEFFLER W, et al. Computational design of self-assembling cyclic protein homo-oligomers[J]. Nature Chemistry, 2017, 9(4): 353-360.
79	KILAMBI K P, REDDY K, GRAY J J. Protein-protein docking with dynamic residue protonation states[J]. PLoS Computational Biology, 2014, 10(12): e1004018.
80	KILAMBI K P, GRAY J J. Rapid calculation of protein pKa values using Rosetta[J]. Biophysical Journal, 2012, 103(3): 587-595.
81	ALFORD R F, KOEHLER LEMAN J, WEITZNER B D, et al. An integrated framework advancing membrane protein modeling and design[J]. PLoS Computational Biology, 2015, 11(9): e1004398.
82	BARTH P, SCHONBRUN J, BAKER D. Toward high-resolution prediction and design of transmembrane helical protein structures[J]. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(40): 15682-15687.
83	YAROV-YAROVOY V, SCHONBRUN J, BAKER D. Multipass membrane protein structure prediction using Rosetta[J]. Proteins, 2006, 62(4): 1010-1025.
84	BOGAN AA, THORN K S. Anatomy of hot spots in protein interfaces[J]. Journal of Molecular Biology, 1998, 280(1): 1-9.
85	KORTEMME T, BAKER D. Computational design of protein-protein interactions[J]. Current Opinion in Chemical Biology, 2004, 8(1): 91-97.

[1]	郑益坤, 郑婕, 胡国鹏. 光遗传学工具在学习记忆中的应用研究[J]. 合成生物学, 2025, 6(1): 87-104.
[2]	温艳华, 刘合栋, 曹春来, 巫瑞波. 蛋白质工程在医药产业中的应用[J]. 合成生物学, 2025, 6(1): 65-86.
[3]	王子渊, 杨立荣, 吴坚平, 郑文隆. 酶促合成手性氨基酸的研究进展[J]. 合成生物学, 2024, 5(6): 1319-1349.
[4]	朱景勇, 李钧翔, 李旭辉, 张瑾, 毋文静. 深度学习在基于序列的蛋白质互作预测中的应用进展[J]. 合成生物学, 2024, 5(1): 88-106.
[5]	吴玉洁, 刘欣欣, 刘健慧, 杨开广, 随志刚, 张丽华, 张玉奎. 基于高通量液相色谱质谱技术的菌株筛选与关键分子定量分析研究进展[J]. 合成生物学, 2023, 4(5): 1000-1019.
[6]	陈志航, 季梦麟, 戚逸飞. 人工智能蛋白质结构设计算法研究进展[J]. 合成生物学, 2023, 4(3): 464-487.
[7]	黄鹤, 吴桐, 王闻达, 李佳珊, 孙黛雯, 叶启威, 龚新奇. 蛋白质复合物结构预测：方法与进展[J]. 合成生物学, 2023, 4(3): 507-523.
[8]	宋益东, 袁乾沐, 杨跃东. 深度学习在蛋白质功能预测中的应用[J]. 合成生物学, 2023, 4(3): 488-506.
[9]	唐一鸣, 姚逸飞, 杨中元, 周运, 王子超, 韦广红. 神经退行性疾病相关蛋白病理性聚集和液液相分离研究进展[J]. 合成生物学, 2023, 4(3): 590-610.
[10]	孟巧珍, 郭菲. “可折叠性”在酶智能设计改造中的应用研究——以AlphaFold2为例[J]. 合成生物学, 2023, 4(3): 571-589.
[11]	王晟, 王泽琛, 陈威华, 陈珂, 彭向达, 欧发芬, 郑良振, 孙瑨原, 沈涛, 赵国屏. 基于人工智能和计算生物学的合成生物学元件设计[J]. 合成生物学, 2023, 4(3): 422-443.
[12]	阮青云, 黄莘, 孟子钧, 全舒. 蛋白质稳定性计算设计与定向进化前沿工具[J]. 合成生物学, 2023, 4(1): 5-29.
[13]	梁丽亚, 刘嵘明. 靶向DNA的Ⅱ类CRISPR/Cas系统的蛋白工程化改造[J]. 合成生物学, 2023, 4(1): 86-101.
[14]	祁延萍, 朱晋, 张凯, 刘彤, 王雅婕. 定向进化在蛋白质工程中的应用研究进展[J]. 合成生物学, 2022, 3(6): 1081-1108.
[15]	易琪昆, 孙晨博, 杨中光, 王日, 寇松姿, 李朝霞, 孙飞. 可基因编码点击化学在材料合成生物学中的应用[J]. 合成生物学, 2022, 3(4): 690-708.