合成生物学 ›› 2021, Vol. 2 ›› Issue (1): 1-14.DOI: 10.12211/2096-8280.2020-074
王也, 王昊晨, 晏明皓, 胡冠华, 汪小我
收稿日期:
2020-07-09
修回日期:
2020-11-15
出版日期:
2021-03-22
发布日期:
2021-03-12
通讯作者:
汪小我
作者简介:
王也(1995—),女,博士研究生,主要研究方向为模式识别与机器学习、生物信息学、合成生物学。E-mail:wangy17@mails.tsinghua.edu.cn基金资助:
Ye WANG, Haochen WANG, Minghao YAN, Guanhua HU, Xiaowo WANG
Received:
2020-07-09
Revised:
2020-11-15
Online:
2021-03-22
Published:
2021-03-12
Contact:
Xiaowo WANG
摘要:
合成生物学研究本着师法自然、改造自然及超越自然的理念,其核心是通过人工方式将基因元件优化改造和重新组合,以得到满足需要的人工生物系统。获取性能优异的生物元件是构建和控制人工生物系统的基础。近年来,人工生物分子在代谢工程和基因治疗等领域有着广泛应用。如何在广袤的分子序列空间中高效地搜索与设计具有特定生物功能的分子序列,是合成生物学所面临的重要科学问题。伴随着人工智能技术的快速发展,智能算法在复杂生物特征的挖掘与生物分子的设计中表现出巨大潜力。本文从利用深度学习技术发掘的复杂特征规律为指导,智能化地探索新药物分子、核酸序列和蛋白质序列空间的角度出发,重点分析了深度生成式模型在不同人工生物序列设计中的应用特点。在此基础上,结合小分子化合物、核酸和蛋白质等生物分子设计的应用案例,总结分析了针对人工生物分子序列设计的定向寻优策略。为了对智能算法设计的分子进行评估,系统分析了不同领域中不同角度序列设计评估方案的特点,展望了人工生物序列智能设计的发展,需要充分考虑生物系统具有多层次间调控高度耦合的复杂特性,从系统角度对不同层次的生物序列进行优化设计,从而推动人工生物系统的智能适配与优化。
中图分类号:
王也, 王昊晨, 晏明皓, 胡冠华, 汪小我. 生物分子序列的人工智能设计[J]. 合成生物学, 2021, 2(1): 1-14.
Ye WANG, Haochen WANG, Minghao YAN, Guanhua HU, Xiaowo WANG. Design of biomolecular sequences by artificial intelligence[J]. Synthetic Biology Journal, 2021, 2(1): 1-14.
图2 生物分子序列生成任务中常用的深度生成模型(Suppose that the biomolecular sequence to be designed is x and the representation of hidden space is z)(a) Generating adversarial network (GAN). GAN contains two ‘adversarial’ networks: the generator G and the discriminator D. The generator tries to capture the data distribution and produces artificial samples to fool the discriminator, whereas the discriminator tries to distinguish generated samples from the training data. After the min-max game between two networks, the artificial sequences generated by G can be used as artificial biomolecules. (b) Variational auto encoder (VAE). VAE is a directed probability graph model constructed by neural networks with autoencoder structures. The biological sequences are generated by sampling the posterior distribution P(x|z) after model training. (c) Recurrent neural network (RNN). RNN is a classical sequential data generation model in natural language processing (NLP), which learns the relationship between the current output of a sequence and the previous information. Starting from the initial input atom or base, the output for artificial biomolecule sequences is composed of the outputs of each step.
Fig. 2 Deep generative models commonly used in biomolecule sequence generation
深度生成式模型 | 生物序列 | 数据形式 | 模型名称 | 寻优算法 | 相关文献 |
---|---|---|---|---|---|
生成对抗网络 | 核酸 | 碱基序列独热编码 | WGAN | 性能得分 梯度寻优 | [ |
蛋白质 | 氨基酸距离矩阵 | DCGAN | 基于Rosetta的采样 | [ | |
小分子药物 | 分子图矩阵编码 | MolGAN | 强化学习 | [ | |
变分/对抗自编码器 | 小分子药物 | 原子团的连接树编码 | Junction Tree VAE | 贝叶斯优化 | [ |
SMILES独热编码 | ChemVAE | 性能得分 梯度寻优 | [ | ||
邻接矩阵与属性向量等组成的概率图 | GraphVAE | 条件生成 | [ | ||
循环神经网络 | 小分子药物 | SMILES独热编码 | ChemTS | 蒙特卡洛树搜索 | [ |
LSTM | 迁移学习 | [ | |||
蛋白质 | 氨基酸独热编码 | LSTM | 迁移学习 | [ |
表1 深度生成式模型与优化算法结合的应用研究
Tab. 1 Applications for deep generative models combined with optimization algorithms
深度生成式模型 | 生物序列 | 数据形式 | 模型名称 | 寻优算法 | 相关文献 |
---|---|---|---|---|---|
生成对抗网络 | 核酸 | 碱基序列独热编码 | WGAN | 性能得分 梯度寻优 | [ |
蛋白质 | 氨基酸距离矩阵 | DCGAN | 基于Rosetta的采样 | [ | |
小分子药物 | 分子图矩阵编码 | MolGAN | 强化学习 | [ | |
变分/对抗自编码器 | 小分子药物 | 原子团的连接树编码 | Junction Tree VAE | 贝叶斯优化 | [ |
SMILES独热编码 | ChemVAE | 性能得分 梯度寻优 | [ | ||
邻接矩阵与属性向量等组成的概率图 | GraphVAE | 条件生成 | [ | ||
循环神经网络 | 小分子药物 | SMILES独热编码 | ChemTS | 蒙特卡洛树搜索 | [ |
LSTM | 迁移学习 | [ | |||
蛋白质 | 氨基酸独热编码 | LSTM | 迁移学习 | [ |
评估指标类型 | 评估指标 | 小分子药物 | 蛋白质序列 | 核酸序列 |
---|---|---|---|---|
分布拟合评估 | 合理性 | SMILES/分子图的合理性 | Rosetta仿真结果 | 连续碱基数目 |
多样性 | 不重复小分子比例 | 不重复的蛋白质比例 | 设计的序列之间的相似性 | |
新颖性 | 新药比例 | 新蛋白比例 | 与天然序列的相似性 | |
分布拟合度 | Frechet ChemNet Distance | 经验性适应度分布得分 | 与天然序列的K-mer相关性 | |
物理化学约束符合度 | 物理化学性质的KL散度 | 统计能量函数 | GC含量 | |
定向优化评估 | 重要结构特征 | 化学结构特征 | 与已知重要功能团的相似性 | 功能性的Motif或重要间隔序列长度 |
测试集重设计比例 | 药物分子重设计比例 | 未报道 | 未报道 | |
自定义优化功能得分 | 药物溶解度等性能得分 | 蛋白靶向位点的功能得分 | 调控强度等功能得分 |
表2 深度生成式模型进行生物序列设计的常用评价指标
Tab. 2 Evaluation criteria for deep generative model designed biomolecular sequences
评估指标类型 | 评估指标 | 小分子药物 | 蛋白质序列 | 核酸序列 |
---|---|---|---|---|
分布拟合评估 | 合理性 | SMILES/分子图的合理性 | Rosetta仿真结果 | 连续碱基数目 |
多样性 | 不重复小分子比例 | 不重复的蛋白质比例 | 设计的序列之间的相似性 | |
新颖性 | 新药比例 | 新蛋白比例 | 与天然序列的相似性 | |
分布拟合度 | Frechet ChemNet Distance | 经验性适应度分布得分 | 与天然序列的K-mer相关性 | |
物理化学约束符合度 | 物理化学性质的KL散度 | 统计能量函数 | GC含量 | |
定向优化评估 | 重要结构特征 | 化学结构特征 | 与已知重要功能团的相似性 | 功能性的Motif或重要间隔序列长度 |
测试集重设计比例 | 药物分子重设计比例 | 未报道 | 未报道 | |
自定义优化功能得分 | 药物溶解度等性能得分 | 蛋白靶向位点的功能得分 | 调控强度等功能得分 |
生物序列 | 数量级 | 智能算法 | 优势 | 挑战 |
---|---|---|---|---|
小分子药物序列 | 1.5×106 [ | 常用RNN、AAE、VAE、GAN结合强化学习和迁移学习进行药物序列设计 | 数据与数据库积累丰富;评估体系较为成熟 | 合成相对困难,需考虑与筛选易于合成的分子序列 |
1.8×106 [ | ||||
1500[ | ||||
蛋白质 序列 | 约100 000 [ | 常用RNN、GAN、ANN结合蛋白设计的Rosetta软件和迁移学习进行蛋白序列设计[ | 模拟预测软件如Rosetta在领域内标准化程度高;蛋白设计可应用场景广阔 | 三维空间结构、折叠构象的搜索与预测准确性仍有限 |
核酸序列 | 与具体物种基因组大小以及核酸序列对应的功能相关 | 利用GAN结合专家知识、预测器等对核酸序列进行设计 | 核酸序列相对易于合成,设计灵活度高,合成周期较短 | 特定功能的核酸序列数据集规模小;调控元件等序列在基因组缺乏精确定义 |
表3 对药物分子、蛋白质和核酸序列进行智能设计的优势与挑战
Tab. 3 Advantages and challenges of intelligent design for drug molecules, proteins and nucleic acid sequences
生物序列 | 数量级 | 智能算法 | 优势 | 挑战 |
---|---|---|---|---|
小分子药物序列 | 1.5×106 [ | 常用RNN、AAE、VAE、GAN结合强化学习和迁移学习进行药物序列设计 | 数据与数据库积累丰富;评估体系较为成熟 | 合成相对困难,需考虑与筛选易于合成的分子序列 |
1.8×106 [ | ||||
1500[ | ||||
蛋白质 序列 | 约100 000 [ | 常用RNN、GAN、ANN结合蛋白设计的Rosetta软件和迁移学习进行蛋白序列设计[ | 模拟预测软件如Rosetta在领域内标准化程度高;蛋白设计可应用场景广阔 | 三维空间结构、折叠构象的搜索与预测准确性仍有限 |
核酸序列 | 与具体物种基因组大小以及核酸序列对应的功能相关 | 利用GAN结合专家知识、预测器等对核酸序列进行设计 | 核酸序列相对易于合成,设计灵活度高,合成周期较短 | 特定功能的核酸序列数据集规模小;调控元件等序列在基因组缺乏精确定义 |
1 | 王文方, 钟建江. 合成生物学驱动的智能生物制造研究进展[J]. 生命科学, 2019, 31(4): 95-104. |
WANG Wenfang, ZHONG Jianjiang. Recent advances in smart biomanufacturing driven by synthetic biology [J]. Chinese Bulletin of Life Sciences, 2019, 31(4): 95-104. | |
2 | SILVA D A, YU S, ULGE U Y, et al. De novo design of potent and selective mimics of IL-2 and IL-15 [J]. Nature, 2019, 565(7738): 186-191. |
3 | KIM Minseon, Ilhwan OH, Jaegyoon AHN. An improved method for prediction of cancer prognosis by network learning [J]. Genes (Basel), 2018, 9(10): 478-491. |
4 | 张学工. 从基因组学模式识别到大数据精准医学[C]//中国自动化大会, 2015. |
ZHANG Xuegong. From genomics pattern recognition to big data precision medicine [C]//China Automation Congress, 2015. | |
5 | SEGLER M H S, PREUSS M, WALLER M P. Planning chemical syntheses with deep neural networks and symbolic AI [J]. Nature, 2018, 555(7698): 604-610. |
6 | LU Xiaoyun, LIU Yuwan, YANG Yiqun, et al. Constructing a synthetic pathway for acetyl-coenzyme A from one-carbon through enzyme design [J]. Nature Communications, 2019, 10(1): 1378-1478. |
7 | LIU Xiaonan, CHENG Jian, ZHANG Guanghui, et al. Engineering yeast for the production of breviscapine by genomic analysis and synthetic biology approaches [J]. Nature Communications, 2018, 9(1): 448-458. |
8 | WANG Meiyan, YU Yuanhuan, SHAO Jiawei, et al. Engineering synthetic optogenetic networks for biomedical applications [J]. Quantitative Biology, 2017, 5(2): 111-123. |
9 | NEVOIGT E, KOHNKE J, FISCHER C R, et al. Engineering of promoter replacement cassettes for fine-tuning of gene expression in Saccharomyces cerevisiae [J]. Applied and Environmental Microbiology, 2006, 72(8): 5266-5273. |
10 | GUIZIOU S, SAUVEPLANE V, CHANG Hung-Ju, et al. A part toolbox to tune genetic expression in Bacillus subtilis [J]. Nucleic Acids Research, 2016, 44(15): 7495-7508. |
11 | RUI M C P, VOGL T, KNIELY C, et al. Synthetic core promoters as universal parts for fine-tuning expression in different yeast species [J]. ACS Synthetic Biology, 2017, 6(3): 471-484. |
12 | BLAZECK J, LIU Leqian, REDDEN H, et al. Tuning gene expression in Yarrowia lipolytica by a hybrid promoter approach [J]. Applied & Environmental Microbiology, 2011, 77(22): 7905-7914. |
13 | URTECHO G, TRIPP A D, INSIGNE K D, et al. Systematic dissection of sequence elements controlling σ70 promoters using a genomically encoded multiplexed reporter assay in Escherichia coli [J]. Biochemistry, 2018, 58(11): 1539-1551. |
14 | HUANG Po Ssu, BOYKEN S E, BAKER D. The coming of age of de novo protein design [J]. Nature, 2016, 537(7620): 320-327. |
15 | MENG Hailin, MA Yingfei, Guoqin MAI, et al. Construction of precise support vector machine based models for predicting promoter strength [J]. Quantitative Biology, 2017, 5(1): 90-98. |
16 | ZOU J, HUSS M, ABID A, et al. A primer on deep learning in genomics [J]. Nature Genetics, 2019, 51(1): 12-18. |
17 | QUANG D, XIE Xiaohui. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences [J]. Nucleic Acids Research, 2016, 44(11): e107-e107. |
18 | SINGH S, YANG Yang, BARNAB, et al. Predicting enhancer-promoter interaction from genomic sequence with deep neural networks [J]. Quantitative Biology, 2019, 7(2): 122-137. |
19 | SINGH R, LANCHANTIN J, ROBINS G, et al. DeepChrome: deep-learning for predicting gene expression from histone modifications [J]. Bioinformatics, 2016, 32(17): i639-i648. |
20 | CHEN Yifei, LI Yi, NARAYAN R, et al. Gene expression inference with deep learning [J]. Bioinformatics, 2016, 32(12): 1832-1839. |
21 | GLIGORIJEVIĆ V, BAROT M, BONNEAU R. deepNF: deep network fusion for protein function prediction [J]. Bioinformatics, 2018, 34(22): 3873-3881. |
22 | KINGMA D P, WELLING M. Auto-encoding variational bayes [C]// 2nd International Conference on Learning Representations, Baniff, Canada, 2014. |
23 | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets [J]. Advances in Neural Information Processing Systems, 2014: 2672-2680. |
24 | LEDIG C, THEIS L, HUSZÁR F, et al. Photo-realistic single image super-resolution using a generative adversarial network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. |
25 | ZHU Junyan, PARK Taesung, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]// Proceedings of the IEEE International Conference on Computer Vision, 2017. |
26 | LI Chuan, WAND M. Precomputed real-time texture synthesis with markovian generative adversarial networks [C]// European Conference on Computer Vision, 2016. |
27 | YANG Xiufeng, ZHANG J, YOSHIZOE K, et al. ChemTS: an efficient python library for de novo molecular generation [J]. Science and Technology of Advanced Materials, 2017, 18(1): 972-976. |
28 | SAIKIN S K, KREISBECK C, SHEBERLA D, et al. Closed-loop discovery platform integration is needed for artificial intelligence to make an impact in drug discovery [J]. Expert Opinion on Drug Discovery, 2019, 14(1): 1-4. |
29 | PUTIN E, ASADULAEV A, VANHAELEN Q, et al. Adversarial threshold neural computer for molecular de novo design [J]. Molecular Pharmaceutics, 2018, 15(10): 4386-4397. |
30 | MERK D, FRIEDRICH L, GRISONI F, et al. De novo design of bioactive small molecules by artificial intelligence [J]. Molecular Informatics, 2018, 37: 1-2. |
31 | WANG Ye, WANG Haochen, WEI Lei, et al. Synthetic promoter design in Escherichia coli based on a deep generative network [J]. Nucleic Acids Research, 2020, 48(12): 6403-6412. |
32 | CHUAI Guohui, MA Hanhui, YAN Jifang, et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning [J]. Genome Biology, 2018, 19(1): 80. |
33 | WANG Daqi, ZHANG Chengdong, WANG Bei, et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning [J]. Nature Communications, 2019, 10(1): 1-14. |
34 | WANG Jun, ZHANG Xiuqing, CHENG Lixin, et al. An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools [J]. RNA Biology, 2020, 17(1): 13-22. |
35 | GRISONI F, NEUHAUS C S, GABERNET G, et al. Designing anticancer peptides by constructive machine learning [J]. ChemMedChem, 2018, 13(13): 1300-1302. |
36 | LI Ruifeng, WIJMA H J, SONG Lu, et al. Computational redesign of enzymes for regio- and enantioselective hydroamination [J]. Nature Chemical Biology, 2018, 14(7): 664-670. |
37 | YANG K K, WU Z, ARNOLD F H. Machine-learning-guided directed evolution for protein engineering [J]. Nature Methods, 2019, 16(8): 687-694. |
38 | SAITO Y, OIKAWA M, NAKAZAWA H, et al. Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins [J]. ACS Synthetic Biology, 2018, 7(9): 2014-2022. |
39 | POPOVA M, ISAYEV O, TROPSHA A. Deep reinforcement learning for de novo drug design [J]. Science Advances, 2018, 4(7): eaap7885. |
40 | RIESSELMAN A J, INGRAHAM J B, MARKS D S. Deep generative models of genetic variation capture the effects of mutations [J]. Nature Methods, 2018, 15(10): 816-822. |
41 | GUPTA A, ZOU J. Feedback GAN for DNA optimizes protein functions [J]. Nature Machine Intelligence, 2019, 1(2): 105-111. |
42 | SATTAROV B, BASKIN I I, HORVATH D, et al. De novo molecular design by combining deep autoencoder recurrent neural networks with generative topographic mapping [J]. Journal of Chemical Information and Modeling, 2019, 59(3): 1182-1196. |
43 | GÓMEZ-BOMBARELLI R, WEI J N, DUVENAUD D, et al. Automatic chemical design using a data-driven continuous representation of molecules [J]. ACS Central Science, 2018, 4(2): 268-276. |
44 | OLIVECRONA M, BLASCHKE T, ENGKVIST O, et al. Molecular de-novo design through deep reinforcement learning [J]. Journal of Cheminformatics, 2017, 9(1): 48-61. |
45 | 檀婧. 基于深度学习的生成式模型研究[D]. 北京: 北京邮电大学, 2019. |
TAN Jing. Research on generative model based on deep learning [D]. Beijing: Beijing University of Posts and Telecommunications, 2019. | |
46 | ANAND N, HUANG P. Generative modeling for protein structures [C]// Advances in Neural Information Processing Systems, 2018. |
47 | LI Yibo, ZHANG Liangren, LIU Zhenming. Multi-objective de novo drug design with conditional graph generative model [J]. Journal of Cheminformatics, 2018, 10(1): 33-57. |
48 | MAATEN L VAN DER, HINTON G. Visualizing data using t-SNE [J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. |
49 | CONSORTIUM U P. Reorganizing the protein space at the Universal Protein Resource (UniProt) [J]. Nucleic Acids Research, 2012, 40: D71. |
50 | WEININGER D. SMILES, a chemical language and information system (I): Introduction to methodology and encoding rules [J]. Journal of Chemical Information and Computer Sciences, 1988, 28(1): 31-36. |
51 | SKALIC M, JIMENEZ J, SABBADIN D, et al. Shape-based generative modeling for de novo drug design [J]. Journal of Chemical Information and Modeling, 2019, 59(3): 1205-1214. |
52 | KADURIN A, NIKOLENKO S, KHRABROV K, et al. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico [J]. Molecular Pharmaceutics, 2017, 14(9): 3098-3104. |
53 | Jaechang LIM, Seongok RYU, KIM Jin Woo, et al. Molecular generative model based on conditional variational autoencoder for de novo molecular design [J]. Journal of Cheminformatics, 2018, 10(1): 31-40. |
54 | HESSLER G, BARINGHAUS K H. Artificial intelligence in drug design [J]. Molecules, 2018, 23(10): 2520-2533. |
55 | BLASCHKE T, OLIVECRONA M, ENGKVIST O, et al. Application of generative autoencoder in de novo molecular design [J]. Molecular Informatics, 2018, 37(1/2): 1-11. |
56 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780. |
57 | SEGLER M H S, KOGEJ T, TYRCHAN C, et al. Generating focused molecule libraries for drug discovery with recurrent neural networks [J]. ACS Central Science, 2018, 4(1): 120-131. |
58 | ALLEY E C, KHIMULYA G, BISWAS S, et al. Unified rational protein engineering with sequence-based deep representation learning [J]. Nature Methods, 2019, 16(12): 1315-1322. |
59 | DE CAO N, KIPF T. MolGAN: An implicit generative model for small molecular graphs [C]// ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018. |
60 | RAZAVI A, OORD A VAN DEN, VINYALS O. Generating diverse high-fidelity images with VQ-VAE-2 [C]// Advances in Neural Information Processing Systems, 2019. |
61 | REIG A J, PIRES M M, SNYDER R A, et al. Alteration of the oxygen-dependent reactivity of de novo Due Ferri proteins [J]. Nature Chemistry, 2012, 4(11): 900-906. |
62 | KILLORAN N, LEE L J, DELONG A, et al. Generating and designing DNA with deep generative models [EB/OL]. [2017-12-17]. . |
63 | JIN Wengong, BARZILAY R, JAAKKOLA T S. Junction tree variational autoencoder for molecular graph generation [C]// International Conference on Machine Learning, 2018. |
64 | SIMONOVSKY M, KOMODAKIS N. Graphvae: towards generation of small graphs using variational autoencoders [C]// International Conference on Artificial Neural Networks, 2018. |
65 | AWALE M, SIROCKIN F, STIEFL N, et al. Drug analogs from fragment-based long short-term memory generative neural networks [J]. Journal of Chemical Information and Modeling, 2019, 59(4): 1347-1356. |
66 | YU Lantao, ZHANG Weinan, WANG Jun, et al. SeqGAN: Sequence generative adversarial nets with policy gradient [C]// Thirty-First AAAI Conference on Artificial Intelligence, 2017. |
67 | ZHAVORONKOV A, IVANENKOV Y A, ALIPER A, et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors [J]. Nature Biotechnology, 2019, 37(9): 1038-1040. |
68 | COULOM R. Efficient selectivity and backup operators in Monte-Carlo tree search [C]// International Conference on Computers and Games, 2006. |
69 | XIONG Peng, WANG Meng, ZHOU Xiaoqun, et al. Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability [J]. Nature Communications, 2014, 5(1): 1-9. |
70 | RASMUSSEN C E. Gaussian processes in machine learning [C]// Summer School on Machine Learning, 2003. |
71 | KOTOPKA B J, SMOLKE C D. Model-driven generation of artificial yeast promoters [J]. Nature Communications, 2020, 11(1): 2113. |
72 | DAS R, BAKER D. Macromolecular modeling with rosetta [J]. Annual Review of Biochemistry, 2008, 77: 363-82. |
73 | MÉNDEZ-LUCIO O, BAILLIF B, D-A CLEVERT, et al. De novo generation of hit-like molecules from gene expression signatures using artificial intelligence [J]. Nature Communications, 2020, 11(1): 1-10. |
74 | POLYKOVSKIY D, ZHEBRAK A, VETROV D, et al. Entangled conditional adversarial autoencoder for de novo drug discovery [J]. Molecular Pharmaceutics, 2018, 15(10): 4398-4405. |
75 | KANG Seokho, Kyunghyun CHO. Conditional molecular design with deep generative models [J]. Journal of Chemical Information and Modeling, 2019, 59(1): 43-52. |
76 | SOHN Kihyuk, YAN Xinchen, Honglak LEE. Learning structured output representation using deep conditional generative models [C]// Advances in Neural Information Processing Systems, 2015. |
77 | CAO Qin, ZHANG Zhenghao, FU A Xi, et al. A unified framework for integrative study of heterogeneous gene regulatory mechanisms [J]. Nature Machine Intelligence, 2020, 2(8): 447-456. |
78 | ProGen: language modeling for protein generation[EB/OL]. [2020-03-07]. . |
79 | WAN Cen, JONES D T. Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks [J]. Nature Machine Intelligence, 2020, 2: 540-550. |
80 | HE Fei, WANG Rui, LI Jiagen, et al. Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture [J]. BMC Systems Biology, 2018, 12(6): 109. |
81 | QI Yifei, ZHANG J Z H. DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet [J]. Journal of Chemical Information and Modeling, 2020, 60(3): 1245-1252. |
82 | KUSNER M, PAIGE B, HERNÁNDEZ-LOBATO J. Grammar variational autoencoder [C]// Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, 2017. |
83 | ANDERSSON R, SANDELIN A. Determinants of enhancer and promoter activities of regulatory elements [J]. Nature Reviews Genetics, 2020, 21(2): 71-87. |
84 | CHING T, HIMMELSTEIN D S, BEAULIEU-JONES B K, et al. Opportunities and obstacles for deep learning in biology and medicine [J]. Journal of The Royal Society Interface, 2018, 15(141): 20170387. |
85 | BROWN N, FISCATO M, SEGLER M H S, et al. GuacaMol: benchmarking models for de novo molecular design [J]. Journal of Chemical Information and Modeling, 2019, 59(3): 1096-1108. |
86 | ROGERS D, HAHN M. Extended-connectivity fingerprints [J]. Journal of Chemical Information and Modeling, 2010, 50(5): 742-754. |
87 | BUTINA D. Unsupervised data base clustering based on daylight's fingerprint and Tanimoto similarity: a fast and automated way to cluster small and large data sets [J]. Journal of Chemical Information and Computer Sciences, 1999, 39(4): 747-750. |
88 | LANDRUM. RDKit: open-source cheminformatics [CP]. http: //www. rdkit. org. (released June 13, 2018). |
89 | DOU Jiayi, VOROBIEVA A A, SHEFFLER W, et al. De novo design of a fluorescence-activating beta-barrel [J]. Nature, 2018, 561(7724): 485-491. |
90 | SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. |
91 | PREUER K, RENZ P, UNTERTHINER T, et al. Fréchet ChemNet distance: a metric for generative models for molecules in drug discovery [J]. Journal of Chemical Information and Modeling, 2018, 58(9): 1736-1741. |
92 | FOX R, ROY A, GOVINDARAJAN S, et al. Optimizing the search algorithm for protein engineering by directed evolution [J]. Protein Engineering, 2003, 16(8): 589-597. |
93 | WU N C, DAI Lei, OLSON C A, et al. Adaptation in protein fitness landscapes is facilitated by indirect paths [J]. elife, 2016, 5: e16965. |
94 | MELNIKOV A, MURUGAN A, ZHANG Xiaolan, et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay [J]. Nature Biotechnology, 2012, 30(3): 271-277. |
95 | TROTT O, OLSON A J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading [J]. Journal of Computational Chemistry, 2010, 31(2): 455-461. |
96 | DING Wentao, CHENG Jian, GUO Dan, et al. Engineering the 5′ UTR-mediated regulation of protein abundance in yeast using nucleotide sequence activity relationships [J]. ACS Synthetic Biology, 2018, 7(12): 2709-2714. |
97 | YAN Minghao. https://github.com/XWangLabTHU/Gpro [CP]. Tsinghua University. (released March15, 2020). |
98 | LIU Yanfeng, LIU Long, LI Jianghua, et al. Synthetic biology toolbox and chassis development in Bacillus subtilis [J]. Trends in Biotechnology, 2019, 37(5): 548-562. |
99 | BENTO A P, GAULTON A, HERSEY A, et al. The ChEMBL bioactivity database: an update [J]. Nucleic Acids Research, 2014, 42(D1): D1083-D1090. |
100 | WANG Jingxue, CAO Huali, ZHANG J Z H, et al. Computational protein design with deep learning neural networks [J]. Scientific Reports, 2018, 8(1): 1-9. |
101 | BUTTON A, MERK D, HISS J A, et al. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis [J]. Nature Machine Intelligence, 2019, 1(7): 307-315. |
[1] | 叶精勤, 黄文华, 潘超, 朱力, 王恒樑. 合成生物学在多糖结合疫苗研发中的应用[J]. 合成生物学, 2024, 5(2): 338-352. |
[2] | 马雪璟, 郭畅, 华兆琳, 侯百东. 合成生物技术助力纳米颗粒疫苗理性设计时代的到来[J]. 合成生物学, 2024, 5(2): 353-368. |
[3] | 涂辉阳, 韩为东, 张斌. 肿瘤新抗原疫苗的设计与优化策略[J]. 合成生物学, 2024, 5(2): 254-266. |
[4] | 方超, 黄卫人. 合成生物学在肿瘤疫苗设计中的应用进展[J]. 合成生物学, 2024, 5(2): 239-253. |
[5] | 王步森, 徐婧含, 高智强, 侯利华. 病毒载体疫苗研究进展[J]. 合成生物学, 2024, 5(2): 281-293. |
[6] | 章金勇, 顾江, 关山, 李海波, 曾浩, 邹全明. 合成生物学助力细菌疫苗研发[J]. 合成生物学, 2024, 5(2): 321-337. |
[7] | 袁为锋, 赵永亮, 吴芷萱, 徐可. 合成生物学在新冠病毒广谱疫苗研发中的应用[J]. 合成生物学, 2024, 5(2): 369-384. |
[8] | 袁燕燕, 陈慧芳, 杨思慧, 王洪辉, 聂舟. 人工调控受体聚集的化学合成生物学策略及应用[J]. 合成生物学, 2024, 5(1): 53-76. |
[9] | 赵静宇, 张健, 祁庆生, 王倩. 基于细菌双组分系统的生物传感器的研究进展[J]. 合成生物学, 2024, 5(1): 38-52. |
[10] | 孟倩, 尹聪, 黄卫人. 肿瘤类器官及其在合成生物学中的研究进展[J]. 合成生物学, 2024, 5(1): 191-201. |
[11] | 郭肖杰, 剪兴金, 王立言, 张翀, 邢新会. 合成生物学表型测试生物反应器及其装备化研究进展[J]. 合成生物学, 2024, 5(1): 16-37. |
[12] | 朱景勇, 李钧翔, 李旭辉, 张瑾, 毋文静. 深度学习在基于序列的蛋白质互作预测中的应用进展[J]. 合成生物学, 2024, 5(1): 88-106. |
[13] | 刘夺, 刘培源, 李连月, 王雅欣, 崔钰惠, 薛慧敏, 王汉杰. 工程化细胞外囊泡的设计合成与生物医学应用[J]. 合成生物学, 2024, 5(1): 154-173. |
[14] | 孙翰, 刘进. 真核微藻脂质代谢工程的研究进展和展望[J]. 合成生物学, 2023, 4(6): 1140-1160. |
[15] | 孙绘梨, 崔金玉, 栾国栋, 吕雪峰. 面向高效光驱固碳产醇的蓝细菌合成生物技术研究进展[J]. 合成生物学, 2023, 4(6): 1161-1177. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||