合成生物学 ›› 2023, Vol. 4 ›› Issue (3): 571-589.DOI: 10.12211/2096-8280.2023-011
孟巧珍1, 郭菲2
收稿日期:
2023-02-06
修回日期:
2023-03-28
出版日期:
2023-06-30
发布日期:
2023-07-05
通讯作者:
郭菲
作者简介:
基金资助:
Qiaozhen MENG1, Fei GUO2
Received:
2023-02-06
Revised:
2023-03-28
Online:
2023-06-30
Published:
2023-07-05
Contact:
Fei GUO
摘要:
天然酶具有绿色环保、高效催化的优点,但由于工业环境的酸碱性、温度等条件不够适宜,天然酶在实际工业生产中往往存在错误折叠、功能受限等问题。使用人工智能技术辅助酶的改造设计,相比传统方法具有高效、快速、低成本的优势,但在这个过程中大部分工作没有考虑设计改造酶的“可折叠性”问题。同时,最近几年来,以AlphaFold2为代表的蛋白质结构预测工具借助人工智能技术取得了突破性的进展,已经具有原子级别的结构预测精度。这一工具的日益成熟,不仅有助于对蛋白结构功能机制的了解,同时可以丰富现有酶结构数据,用于后续的研究。因此,基于现有酶改造以及从头设计新酶过程中出现的错误折叠导致成功率不高、实验验证成本高的问题,我们认为结合蛋白质结构预测工具辅助酶的改造设计任务,可以增加设计可靠酶的数量,同时降低实验成本。本文首先梳理回顾人工智能技术在酶设计改造中的应用,主要从序列和结构两个角度展开。然后将现有蛋白质结构预测工具归纳成四种类型分别介绍其设计原理和预测能力。接着以AlphaFold2为代表性工作,归纳了三种在现有技术基础上利用结构预测工具进一步提高酶改造的合理性以及酶设计的“可折叠性”的方式:①结构“分析器”;②突变“筛选器”;③折叠“监督器”。最后在讨论部分总结并提出了一些现有算法的不足和缺陷。随着人工智能技术的逐渐发展以及人类对蛋白质作用机理的研究,酶的改造设计精度一定会有所提高,这将助力合成生物学的快速发展。
中图分类号:
孟巧珍, 郭菲. “可折叠性”在酶智能设计改造中的应用研究——以AlphaFold2为例[J]. 合成生物学, 2023, 4(3): 571-589.
Qiaozhen MENG, Fei GUO. Applications of foldability in intelligent enzyme engineering and design: take AlphaFold2 for example[J]. Synthetic Biology Journal, 2023, 4(3): 571-589.
方法名称/ 作者 | 类型 | 模型框架 | 输入 | 输出 | 训练集 | 应用 | 特点 | 网页/GitHub |
---|---|---|---|---|---|---|---|---|
SCUBA[ | 骨架设计 | NC-NN | 二级 结构motifs | 骨架 | PDB | 两层α/β蛋白; 四螺旋束蛋白;EXTD | 突破之前方法仅限于已有模式的限制,基于核密度估计构造神经网络形式的能量函数 | https://doi.org/10.5281/zenodo.4533424 |
Namrata Anand[ | 骨架设计 | DCGAN | — | 距离图 | distance maps | 补齐完整 的结构 | Cα原子之间的相对距离作为约束并优化 | — |
Mire Zloh[ | 序列生成 | LSTM | — | 序列 | CAMP+DBAASP+DRAMP+YADAMP | — | 设计对大肠杆菌具有潜在抗菌活性的短肽,并通过结构和表面性能与典型的AMP结构进行比较 | — |
Gisbert Schneider[ | 序列生成 | RNN | — | 序列 | ADAM/APD/DADP | 设计具有抗 菌功能的肽 | 设计出的肽相比随机生成的肽具有抗菌活性的较高 | https://github.com/alexarnimueller/LSTM_peptides |
ProteinGAN[ | 序列生成 | GAN | — | 序列 | MDH序列 | MDH酶 | 设计与苹果酸脱氢酶同样功能的酶,可同时出现100多个位点 | https://github.com/Biomatter-Designs/ProteinGAN |
Mostafa Karimi[ | 序列生成,给定折叠方式 | gcWGAN | — | 序列 | SCOPe v. 2.07 | — | 设计了一个从序列到折叠的预测器作为“oracle”,监督序列折叠成给定的折叠类型 | https://github.com/Shen-Lab/gcWGAN |
ProteinMPNN[ | 序列设计,结构约束 | 结构编码-序列解码的自回归模型 | 3D 结构 | 序列 | CATH 4.2 | 单体、 环状低聚物、 蛋白质纳米颗粒 | 从结构中学习残基类型,将原子配对距离势融入到边的特征表示中,使序列恢复率直接提高约7.8% | https://github.com/dauparas/ProteinMPNN |
ABACUS-R[ | 序列设计,结构约束 | 结构编码-序列解码 | 3D 结构 | 序列 | CATH 4.2 | PDB ID: 1r26, 1cy5 and 1ubq 3个骨架结构 | 从结构中学习残基类型,多任务学习 | https://github.com/liuyf020419/ABACUS-R |
Transformer | ||||||||
David T. Jones[ | 序列设计,结构约束 | 贪婪的半随机游走,逐步突变起始序列进行迭代的端到端设计 | 序列 | 序列 | — | Top7;Peak6;Foldit1;Ferredog-Diesel | 利用AlphaFold2预测生成序列的结构以及pLDDT打分,判断突变位点以及用距离图约束结构符合给定结构;对于最初始的序列,通过生成模型以及AlphaFold2结构约束产生初始序列 | |
AlphaDesign[ | 序列设计,结构约束 | 基于进化的遗传算法迭代生成序列 | 随机序列 | 序列 | — | 设计稳定的 单体,二聚体 直到六聚体 | 利用AlphaFold2预测的结构与要设计的骨架结构的差异来调整序列的优化 | — |
trDesign[ | 序列设计,结构约束 | trRosetta | 随机序列 | 序列 | — | — | 二维距离直方图的损失来更新梯度,更新被表示为PSSM的序列,可以理解为“折叠”的逆问题 | https://github.com/gjoni/trDesign |
Hallucination[ | 序列设计,结构约束,不固定骨架结构 | trRosetta | 随机序列 | 序列/结构 | PDB训练背景分布概率 | 设计2000条新的幻觉序列,聚类后129条表达后,62个蛋白 可溶,高稳定 | 随机出发设计一条序列,通过最大化与随机背景序列的结构差异,约束该序列具有一个典型的2维结构特性 | https://github.com/gjoni/trDesign |
Constrained hallucination2[ | 序列设计,结构约束 | RoseTTAFold | 序列/结构 | 序列/结构 | RoseTTAFold训练集 | 设计具有给定motif的序列,通过神经网络不断迭代推理以及反向传播来设计序列 | https://github.com/RosettaCommons/RFDesign | |
RFjoint[ | 序列设计,结构约束 | 训练RoseTTAFold | 序列/结构 | 序列/结构 | 微调,其中25%:PDB (2020-02-17); 75%:AF2预测结构 | 免疫原;金属结合;新酶;特定结合的蛋白 | 添加同时恢复序列和结构信息的损失,直接训练全新的模型 | |
PiFold[ | 序列设计 | GNN | 3D 结构 | 序列 | CATH | 序列恢复率:51.66%(CATH4.2),58.72%(TS50),60.42%(TS500) | 设计了新的残基特征器,PiGNN层学习多尺度(节点,边,全局)的残基相互作用信息 | https://github.com/A4Bio/PiFold |
ProDESIGN-LE[ | 序列设计 | Transformer+MLP | 3D 结构 | 序列 | PDB40 | 设计CATⅢ酶新序列,3/5可表达且可溶;GFP | 通过Transformer学习当前残基在局部结构环境中的依赖性,使设计序列中的残基类型适配于当前的局部环境 | http://81.70.37.223/; https://github.com/bigict/ProDESIGN-LE |
表1 蛋白质设计工具汇总
Table 1 Summary of protein design tools
方法名称/ 作者 | 类型 | 模型框架 | 输入 | 输出 | 训练集 | 应用 | 特点 | 网页/GitHub |
---|---|---|---|---|---|---|---|---|
SCUBA[ | 骨架设计 | NC-NN | 二级 结构motifs | 骨架 | PDB | 两层α/β蛋白; 四螺旋束蛋白;EXTD | 突破之前方法仅限于已有模式的限制,基于核密度估计构造神经网络形式的能量函数 | https://doi.org/10.5281/zenodo.4533424 |
Namrata Anand[ | 骨架设计 | DCGAN | — | 距离图 | distance maps | 补齐完整 的结构 | Cα原子之间的相对距离作为约束并优化 | — |
Mire Zloh[ | 序列生成 | LSTM | — | 序列 | CAMP+DBAASP+DRAMP+YADAMP | — | 设计对大肠杆菌具有潜在抗菌活性的短肽,并通过结构和表面性能与典型的AMP结构进行比较 | — |
Gisbert Schneider[ | 序列生成 | RNN | — | 序列 | ADAM/APD/DADP | 设计具有抗 菌功能的肽 | 设计出的肽相比随机生成的肽具有抗菌活性的较高 | https://github.com/alexarnimueller/LSTM_peptides |
ProteinGAN[ | 序列生成 | GAN | — | 序列 | MDH序列 | MDH酶 | 设计与苹果酸脱氢酶同样功能的酶,可同时出现100多个位点 | https://github.com/Biomatter-Designs/ProteinGAN |
Mostafa Karimi[ | 序列生成,给定折叠方式 | gcWGAN | — | 序列 | SCOPe v. 2.07 | — | 设计了一个从序列到折叠的预测器作为“oracle”,监督序列折叠成给定的折叠类型 | https://github.com/Shen-Lab/gcWGAN |
ProteinMPNN[ | 序列设计,结构约束 | 结构编码-序列解码的自回归模型 | 3D 结构 | 序列 | CATH 4.2 | 单体、 环状低聚物、 蛋白质纳米颗粒 | 从结构中学习残基类型,将原子配对距离势融入到边的特征表示中,使序列恢复率直接提高约7.8% | https://github.com/dauparas/ProteinMPNN |
ABACUS-R[ | 序列设计,结构约束 | 结构编码-序列解码 | 3D 结构 | 序列 | CATH 4.2 | PDB ID: 1r26, 1cy5 and 1ubq 3个骨架结构 | 从结构中学习残基类型,多任务学习 | https://github.com/liuyf020419/ABACUS-R |
Transformer | ||||||||
David T. Jones[ | 序列设计,结构约束 | 贪婪的半随机游走,逐步突变起始序列进行迭代的端到端设计 | 序列 | 序列 | — | Top7;Peak6;Foldit1;Ferredog-Diesel | 利用AlphaFold2预测生成序列的结构以及pLDDT打分,判断突变位点以及用距离图约束结构符合给定结构;对于最初始的序列,通过生成模型以及AlphaFold2结构约束产生初始序列 | |
AlphaDesign[ | 序列设计,结构约束 | 基于进化的遗传算法迭代生成序列 | 随机序列 | 序列 | — | 设计稳定的 单体,二聚体 直到六聚体 | 利用AlphaFold2预测的结构与要设计的骨架结构的差异来调整序列的优化 | — |
trDesign[ | 序列设计,结构约束 | trRosetta | 随机序列 | 序列 | — | — | 二维距离直方图的损失来更新梯度,更新被表示为PSSM的序列,可以理解为“折叠”的逆问题 | https://github.com/gjoni/trDesign |
Hallucination[ | 序列设计,结构约束,不固定骨架结构 | trRosetta | 随机序列 | 序列/结构 | PDB训练背景分布概率 | 设计2000条新的幻觉序列,聚类后129条表达后,62个蛋白 可溶,高稳定 | 随机出发设计一条序列,通过最大化与随机背景序列的结构差异,约束该序列具有一个典型的2维结构特性 | https://github.com/gjoni/trDesign |
Constrained hallucination2[ | 序列设计,结构约束 | RoseTTAFold | 序列/结构 | 序列/结构 | RoseTTAFold训练集 | 设计具有给定motif的序列,通过神经网络不断迭代推理以及反向传播来设计序列 | https://github.com/RosettaCommons/RFDesign | |
RFjoint[ | 序列设计,结构约束 | 训练RoseTTAFold | 序列/结构 | 序列/结构 | 微调,其中25%:PDB (2020-02-17); 75%:AF2预测结构 | 免疫原;金属结合;新酶;特定结合的蛋白 | 添加同时恢复序列和结构信息的损失,直接训练全新的模型 | |
PiFold[ | 序列设计 | GNN | 3D 结构 | 序列 | CATH | 序列恢复率:51.66%(CATH4.2),58.72%(TS50),60.42%(TS500) | 设计了新的残基特征器,PiGNN层学习多尺度(节点,边,全局)的残基相互作用信息 | https://github.com/A4Bio/PiFold |
ProDESIGN-LE[ | 序列设计 | Transformer+MLP | 3D 结构 | 序列 | PDB40 | 设计CATⅢ酶新序列,3/5可表达且可溶;GFP | 通过Transformer学习当前残基在局部结构环境中的依赖性,使设计序列中的残基类型适配于当前的局部环境 | http://81.70.37.223/; https://github.com/bigict/ProDESIGN-LE |
图1 结构预测工具在酶智能设计改造中的应用方向
Fig. 1 Specific aspects for the application of structure prediction tools in the intelligent design and transformation of enzymes
1 | FERRER S, RUIZ-PERNÍA J, MARTÍ S, et al. Hybrid schemes based on quantum mechanics/molecular mechanics simulations goals to success, problems, and perspectives[J]. Advances in Protein Chemistry and Structural Biology, 2011, 85: 81-142. |
2 | MAZURENKO S, PROKOP Z, DAMBORSKY J. Machine learning in enzyme engineering[J]. ACS Catalysis, 2020, 10(2): 1210-1223. |
3 | DINMUKHAMED T, HUANG Z Y, LIU Y F, et al. Current advances in design and engineering strategies of industrial enzymes[J]. Systems Microbiology and Biomanufacturing, 2021, 1(1): 15-23. |
4 | YANG H Q, LI J H, SHIN H D, et al. Molecular engineering of industrial enzymes: recent advances and future prospects[J]. Applied Microbiology and Biotechnology, 2014, 98(1): 23-29. |
5 | SHELDON R A, PEREIRA P C. Biocatalysis engineering: the big picture[J]. Chemical Society Reviews, 2017, 46(10): 2678-2691. |
6 | LI G Y, DONG Y J, REETZ M T. Can machine learning revolutionize directed evolution of selective enzymes?[J]. Advanced Synthesis & Catalysis, 2019, 361(11): 2377-2386. |
7 | JIANG L, ALTHOFF E A, CLEMENTE F R, et al. De novo computational design of retro-aldol enzymes[J]. Science, 2008, 319(5868): 1387-1391. |
8 | RÖTHLISBERGER D, KHERSONSKY O, WOLLACOTT A M, et al. Kemp elimination catalysts by computational enzyme design[J]. Nature, 2008, 453(7192): 190-195. |
9 | SIEGEL J B, ZANGHELLINI A, LOVICK H M, et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction[J]. Science, 2010, 329(5989): 309-313. |
10 | YANG K K, WU Z, ARNOLD F H. Machine-learning-guided directed evolution for protein engineering[J]. Nature Methods, 2019, 16(8): 687-694. |
11 | SUN J Y, CUI Y L, WU B. GRAPE, a greedy accumulated strategy for computational protein engineering[J]. Methods in Enzymology, 2021, 648: 207-230. |
12 | PEARCE R, HUANG X, OMENN G S, et al. De novo protein fold design through sequence-independent fragment assembly simulations[J]. Proceedings of the National Academy of Sciences of the United States of America, 2023, 120(4): e2208275120. |
13 | LISTOV D, LIPSH-SOKOLIK R, ROSSET S, et al. Assessing and enhancing foldability in designed proteins[J]. Protein Science, 2022, 31(9): e4400. |
14 | TUNYASUVUNAKOOL K, ADLER J, WU Z, et al. Highly accurate protein structure prediction for the human proteome[J]. Nature, 2021, 596(7873): 590-596. |
15 | SENIOR A W, EVANS R, JUMPER J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710. |
16 | YANG J Y, ANISHCHENKO I, PARK H, et al. Improved protein structure prediction using predicted interresidue orientations[J]. Proceedings of the National Academy of Sciences of the United States of America, 2020, 117(3): 1496-1503. |
17 | KAWASHIMA S, KANEHISA M. AAindex: amino acid index database[J]. Nucleic Acids Research, 2000, 28(1): 374. |
18 | SANDBERG M, ERIKSSON L, JONSSON J, et al. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids[J]. Journal of Medicinal Chemistry, 1998, 41(14): 2481-2491. |
19 | KULIKOVA A V, DIAZ D J, LOY J M, et al. Learning the local landscape of protein structures with convolutional neural networks[J]. Journal of Biological Physics, 2021, 47(4): 435-454. |
20 | ASGARI E, MOFRAD M R. Continuous distributed representation of biological sequences for deep proteomics and genomics[J]. PLoS One, 2015, 10(11): e0141287. |
21 | MEIER J, RAO R S, VERKUIL R, et al. Language models enable zero-shot prediction of the effects of mutations on protein function[C/OL]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021. 34: 29287-29303 [2023-02-01]. . |
22 | RAO R, BHATTACHARYA N, THOMAS N, et al. Evaluating protein transfer learning with TAPE[J]. Advances in Neural Information Processing Systems, 2019, 32: 9689-9701. |
23 | SVERRISSON F, FEYDY J, CORREIA B E, et al. Fast end-to-end learning on protein surfaces[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, Tennessee, USA. IEEE, 2021: 15267-15276. |
24 | JIANG Y, RAN X, YANG Z J. Data-driven enzyme engineering to identify function-enhancing enzymes[J]. Protein Engineering, Design & Selection, 2023, 36: gzac009. |
25 | WU Z, KAN S B J, LEWIS R D, et al. Machine learning-assisted directed protein evolution with combinatorial libraries[J]. Proceedings of the National Academy of Sciences of the United States of America, 2019, 116(18): 8852-8858. |
26 | BISWAS S, KHIMULYA G, ALLEY E C, et al. Low-N protein engineering with data-efficient deep learning[J]. Nature Methods, 2021, 18(4): 389-396. |
27 | SHASHKOVA T I, UMERENKOV D, SALNIKOV M, et al. SEMA: antigen B-cell conformational epitope prediction using deep transfer learning[J]. Frontiers in Immunology, 2022, 13: 960985. |
28 | LU H Y, DIAZ D J, CZARNECKI N J, et al. Machine learning-aided engineering of hydrolases for PET depolymerization[J]. Nature, 2022, 604(7907): 662-667. |
29 | SHROFF R, COLE A W, DIAZ D J, et al. Discovery of novel gain-of-function mutations guided by structure-based deep learning[J]. ACS Synthetic Biology, 2020, 9(11): 2927-2935. |
30 | RIVES A, MEIER J, SERCU T, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J]. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118. |
31 | PERTUSI D A, MOURA M E, JEFFRYES J G, et al. Predicting novel substrates for enzymes with minimal experimental effort with active learning[J]. Metabolic Engineering, 2017, 44: 171-181. |
32 | HUANG B, XU Y, HU X H, et al. A backbone-centred energy function of neural networks for protein design[J]. Nature, 2022, 602(7897): 523-528. |
33 | ANAND N, HUANG P S. Generative modeling for protein structures[EB/OL]. Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018, 31[2023-02-01]. . |
34 | ANAND N, EGUCHI R R, HUANG P S. Fully differentiable full-atom protein backbone generation[C/OL]//Deep Generative Models for Highly Structured Data, New Orleans, Louisiana, USA, May 6-9, 2019, ICLR 2019 Workshop, 2019[2023-02-01]. . |
35 | WANG C, GARLICK S, ZLOH M. Deep learning for novel antimicrobial peptide design[J]. Biomolecules, 2021, 11(3): 471. |
36 | MÜLLER A T, HISS J A, SCHNEIDER G. Recurrent neural network model for constructive peptide design[J]. Journal of Chemical Information and Modeling, 2018, 58(2): 472-479. |
37 | REPECKA D, JAUNISKIS V, KARPUS L, et al. Expanding functional protein sequence spaces using generative adversarial networks[J]. Nature Machine Intelligence, 2021, 3(4): 324-333. |
38 | KARIMI M, ZHU S W, CAO Y, et al. De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks[J]. Journal of Chemical Information and Modeling, 2020, 60(12): 5667-5681. |
39 | DAUPARAS J, ANISHCHENKO I, BENNETT N, et al. Robust deep learning-based protein sequence design using ProteinMPNN[J]. Science, 2022, 378(6615): 49-56. |
40 | LIU Y F, ZHANG L, WANG W L, et al. Rotamer-free protein sequence design based on deep learning and self-consistency[J]. Nature Computational Science, 2022, 2(7): 451-462. |
41 | MOFFAT L, GREENER J G, JONES D T. Using AlphaFold for rapid and accurate fixed backbone protein design[EB/OL]. bioRxiv, 2021: 2021.08. 24.457549[2023-02-01]. . |
42 | JENDRUSCH M, KORBEL J, SADIQ S. AlphaDesign: a de novo protein design framework based on AlphaFold[EB/OL]. bioRxiv, 2021: 2021.10. 11.463937[2023-02-01]. . |
43 | NORN C, WICKY B I M, JUERGENS D, et al. Protein sequence design by explicit energy landscape optimization[EB/OL]. bioRxiv, 2020: 10.1101/2020.07.23.218917[2023-02-01]. . |
44 | ANISHCHENKO I, PELLOCK S J, CHIDYAUSIKU T M, et al. De novo protein design by deep network hallucination[J]. Nature, 2021, 600(7889): 547-552. |
45 | WANG J, LISANZA S, JUERGENS D, et al. Scaffolding protein functional sites using deep learning[J]. Science, 2022, 377(6604): 387-394. |
46 | GAO Z, TAN C, LI S Z. PiFold: toward effective and efficient protein inverse folding[EB/OL]. arXiv, 2022: 2209.12643[2023-02-01]. . |
47 | HUANG B, FAN T W, WANG K Y, et al. Accurate and efficient protein sequence design through learning concise local environment of residues[J]. Bioinformatics, 2023, 39(3): btad122. |
48 | XIONG P, WANG M, ZHOU X Q, et al. Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability[J]. Nature Communications, 2014, 5: 5330. |
49 | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. |
50 | RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. arXiv, 2015: 1511.06434[2023-02-01]. . |
51 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. |
52 | KINGMA D P, WELLING M. Auto-encoding variational bayes[EB/OL]. arXiv, 2013: 1312.6114[2023-02-01]. . |
53 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. December 4-9, 2017, Long Beach, California, USA. New York: ACM, 2017: 6000-6010. |
54 | INGRAHAM J, GARG V K, BARZILAY R, et al. Generative models for graph-based protein design[C/OL]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, 32[2023-02-01]. . |
55 | MCPARTLON M, LAI B, XU J B. A deep SE(3)-equivariant model for learning inverse protein folding[EB/OL]. bioRxiv, 2022[2023-02-01]. . |
56 | HOU J, ADHIKARI B, CHENG J L. DeepSF: deep convolutional neural network for mapping protein sequences to folds[J]. Bioinformatics, 2018, 34(8): 1295-1303. |
57 | ANAND N, EGUCHI R, MATHEWS I I, et al. Protein sequence design with a learned potential[J]. Nature Communications, 2022, 13: 746. |
58 | SUH D, LEE J W, CHOI S, et al. Recent applications of deep learning methods on evolution- and contact-based protein structure prediction[J]. International Journal of Molecular Sciences, 2021, 22(11): 6032. |
59 | BROOKS B R, BRUCCOLERI R E, OLAFSON B D, et al. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations[J]. Journal of Computational Chemistry, 1983, 4(2): 187-217. |
60 | Klepeis J L, Floudas C A. ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence[J]. Biophysical Journal, 2003, 85(4): 2119-2146. |
61 | SUBRAMANI A, WEI Y, FLOUDAS C A. ASTRO-FOLD 2.0: an enhanced framework for protein structure prediction[J]. AIChE Journal, 2012, 58(5): 1619-1637. |
62 | BURLEY S K, BERMAN H M, KLEYWEGT G J, et al. Protein data bank (PDB): the single global macromolecular structure archive[J]. Methods in Molecular Biology, 2017, 1607: 627-641. |
63 | XU D, ZHANG Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field[J]. Proteins: Structure, Function, and Bioinformatics, 2012, 80(7): 1715-1735. |
64 | YANG J Y, ZHANG Y. I-TASSER server: new development for protein structure and function predictions[J]. Nucleic Acids Research, 2015, 43(W1): W174-W181. |
65 | YANG J Y, YAN R X, ROY A, et al. The I-TASSER Suite: protein structure and function prediction[J]. Nature Methods, 2015, 12(1): 7-8. |
66 | LEAVER-FAY A, TYKA M, LEWIS S M, et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules[M]//Computer Methods, Part C-Methods in Enzymology. Amsterdam: Elsevier, 2011: 545-574. |
67 | JONES D T, BUCHAN D W A, COZZETTO D, et al. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments[J]. Bioinformatics, 2012, 28(2): 184-190. |
68 | BITBOL A F, DWYER R S, COLWELL L J, et al. Inferring interaction partners from protein sequences[J]. Proceedings of the National Academy of Sciences of the United States of America, 2016, 113(43): 12180-12185. |
69 | MORCOS F, PAGNANI A, LUNT B, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families[J]. Proceedings of the National Academy of Sciences of the United States of America, 2011, 108(49): E1293-E1301. |
70 | SEEMAYER S, GRUBER M, SÖDING J. CCMpred—fast and precise prediction of protein residue-residue contacts from correlated mutations[J]. Bioinformatics, 2014, 30(21): 3128-3130. |
71 | WEIGT M, WHITE R A, SZURMANT H, et al. Identification of direct residue contacts in protein-protein interaction by message passing[J]. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106(1): 67-72. |
72 | KAMISETTY H, OVCHINNIKOV S, BAKER D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era[J]. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110(39): 15674-15679. |
73 | WANG S, SUN S Q, LI Z, et al. Accurate de novo prediction of protein contact map by ultra-deep learning model[J]. PLoS Computational Biology, 2017, 13(1): e1005324. |
74 | XU J B. Distance-based protein folding powered by deep learning[J]. Proceedings of the National Academy of Sciences of the United States of America, 2019, 116(34): 16856-16865. |
75 | GREENER J G, KANDATHIL S M, JONES D T. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints[J]. Nature Communications, 2019, 10: 3977. |
76 | BRUNGER A T. Version 1.2 of the crystallography and NMR system[J]. Nature Protocols, 2007, 2(11): 2728-2733. |
77 | Zheng W, WUYUN Q Q G, Zhou X G, et al. Integrating deep neural network models with I-TASSER for accurate protein structure prediction[EB/OL]. 2022[2023-02-01]. . |
78 | LI Y, ZHANG C X, YU D J, et al. Deep learning geometrical potential for high-accuracy ab initio protein structure prediction[J]. iScience, 2022, 25(6): 104425. |
79 | ALQURAISHI M. End-to-end differentiable learning of protein structure[J]. Cell Systems, 2019, 8(4): 292-301.e3. |
80 | LIN Z M, AKIN H, RAO R, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction[EB/OL]. bioRxiv, 2022: 10.1101/2022.07.20.500902[2023-02-01]. . |
81 | WANG W K, PENG Z L, YANG J Y. Single-sequence protein structure prediction using supervised transformer protein language models[J]. Nature Computational Science, 2022, 2(12): 804-814. |
82 | WU R D, DING F, WANG R, et al. High-resolution de novo structure prediction from primary sequence[EB/OL]. bioRxiv, 2022[2023-02-01]. . |
83 | CHOWDHURY R, BOUATTA N, BISWAS S, et al. Single-sequence protein structure prediction using a language model and deep learning[J]. Nature Biotechnology, 2022, 40(11): 1617-1623. |
84 | BAEK M, DIMAIO F, ANISHCHENKO I, et al. Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876. |
85 | LIPSH-SOKOLIK R, KHERSONSKY O, SCHRÖDER S P, et al. Combinatorial assembly and design of enzymes[J]. Science, 2023, 379(6628): 195-201. |
86 | MOFFAT L, KANDATHIL S M, JONES D T. Design in the DARK: learning deep generative models for de novo protein design[EB/OL]. bioRxiv, 2022: 2022.01. 27.478087[2023-02-01]. . |
87 | ZHANG Y, SKOLNICK J. TM-align: a protein structure alignment algorithm based on the TM-score[J]. Nucleic Acids Research, 2005, 33(7): 2302-2309. |
88 | BENNETT N, COVENTRY B, GORESHNIK I, et al. Improving de novo protein binder design with deep learning[EB/OL]. bioRxiv, 2022: 2022.06. 15.495993[2023-02-01]. . |
89 | STEIN R A, MCHAOURAB H S. Modeling alternate conformations with Alphafold2 via modification of the multiple sequence alignment[EB/OL]. bioRxiv, 2021: 2021.11.29.470469[2023-02-01]. . |
90 | CASADEVALL G, DURAN C, ESTÉVEZ-GAY M, et al. Estimating conformational heterogeneity of tryptophan synthase with a template-based Alphafold2 approach[J]. Protein Science, 2022, 31(10): e4426. |
91 | GOULET A, CAMBILLAU C, ROUSSEL A, et al. Structure prediction and analysis of hepatitis E virus non-structural proteins from the replication and transcription machinery by AlphaFold2[J]. Viruses, 2022, 14(7): 1537. |
92 | LI H, BAO Q Q, ZHAO J F, et al. Directed evolution engineering to improve activity of glucose dehydrogenase by increasing pocket hydrophobicity[J]. Frontiers in Microbiology, 2022, 13: 1044226. |
93 | BURNIM A A, XU D, SPENCE M A, et al. Analysis of insertions and extensions in the functional evolution of the ribonucleotide reductase family[J]. Protein Science, 2022, 31(12): e4483. |
94 | WU Y T, LIU J Q, HAN X, et al. Eliminating host-guest incompatibility via enzyme mining enables the high-temperature production of N-acetylglucosamine[J]. iScience, 2023, 26(1): 105774. |
95 | BARTAS M, SLYCHKO K, BRÁZDA V, et al. Searching for new Z-DNA/Z-RNA binding proteins based on structural similarity to experimentally validated zα domain[J]. International Journal of Molecular Sciences, 2022, 23(2): 768. |
96 | SHEN Y, WANG Y L, WEI X, et al. Engineering the active site pocket to enhance the catalytic efficiency of a novel feruloyl esterase derived from human intestinal bacteria Dorea formicigenerans [J]. Frontiers in Bioengineering and Biotechnology, 2022, 10: 936914. |
97 | TSABAN T, VARGA J K, AVRAHAM O, et al. Harnessing protein folding neural networks for peptide-protein docking[J]. Nature Communications, 2022, 13(1): 176. |
98 | LI G, BURIC F, ZRIMEC J, et al. Learning deep representations of enzyme thermal adaptation[J]. Protein Science, 2022, 31(12): e4480. |
[1] | 刁志钿, 王喜先, 孙晴, 徐健, 马波. 单细胞拉曼光谱测试分选装备研制及应用进展[J]. 合成生物学, 2023, 4(5): 1020-1035. |
[2] | 卢挥, 张芳丽, 黄磊. 合成生物学自动化装置iBioFoundry的构建与应用[J]. 合成生物学, 2023, 4(5): 877-891. |
[3] | 白仲虎, 任和, 聂简琪, 孙杨. 高通量平行发酵技术的发展与应用[J]. 合成生物学, 2023, 4(5): 904-915. |
[4] | 吴玉洁, 刘欣欣, 刘健慧, 杨开广, 随志刚, 张丽华, 张玉奎. 基于高通量液相色谱质谱技术的菌株筛选与关键分子定量分析研究进展[J]. 合成生物学, 2023, 4(5): 1000-1019. |
[5] | 胡哲辉, 徐娟, 卞光凯. 自动化高通量技术在天然产物生物合成中的应用[J]. 合成生物学, 2023, 4(5): 932-946. |
[6] | 刘欢, 崔球. 原位电离质谱技术在微生物菌株筛选中的应用进展[J]. 合成生物学, 2023, 4(5): 980-999. |
[7] | 王雁南, 孙宇辉. 碱基编辑技术及其在微生物合成生物学中的应用[J]. 合成生物学, 2023, 4(4): 720-737. |
[8] | 刘晚秋, 季向阳, 许慧玲, 卢屹聪, 李健. 限制性内切酶的无细胞快速制备研究[J]. 合成生物学, 2023, 4(4): 840-851. |
[9] | 孙美莉, 王凯峰, 陆然, 纪晓俊. 解脂耶氏酵母底盘细胞的工程改造及应用[J]. 合成生物学, 2023, 4(4): 779-807. |
[10] | 孙智, 杨宁, 娄春波, 汤超, 杨晓静. 功能拓扑的理性设计及其在合成生物学中的应用[J]. 合成生物学, 2023, 4(3): 444-463. |
[11] | 赖奇龙, 姚帅, 查毓国, 白虹, 宁康. 微生物组生物合成基因簇发掘方法及应用前景[J]. 合成生物学, 2023, 4(3): 611-627. |
[12] | 陈志航, 季梦麟, 戚逸飞. 人工智能蛋白质结构设计算法研究进展[J]. 合成生物学, 2023, 4(3): 464-487. |
[13] | 康里奇, 谈攀, 洪亮. 人工智能时代下的酶工程[J]. 合成生物学, 2023, 4(3): 524-534. |
[14] | 王晟, 王泽琛, 陈威华, 陈珂, 彭向达, 欧发芬, 郑良振, 孙瑨原, 沈涛, 赵国屏. 基于人工智能和计算生物学的合成生物学元件设计[J]. 合成生物学, 2023, 4(3): 422-443. |
[15] | 吕海龙, 王建, 吕浩, 王金, 徐勇, 顾大勇. 合成生物学在下一代基因诊断技术中的应用进展[J]. 合成生物学, 2023, 4(2): 318-332. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||