Synthetic Biology Journal ›› 2023, Vol. 4 ›› Issue (3): 611-627.DOI: 10.12211/2096-8280.2022-075

• Invited Review • Previous Articles    

Microbiome-based biosynthetic gene cluster data mining techniques and application potentials

Qilong LAI, Shuai YAO, Yuguo ZHA, Hong BAI, Kang NING   

  1. Key Laboratory of Molecular Biophysics of the Ministry of Education,Hubei Key Laboratory of Bioinformatics and Molecular-imaging,Center of Artificial Intelligence Biology,Department of Bioinformatics and Systems Biology,College of Life Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,Hubei,China
  • Received:2022-12-26 Revised:2022-03-10 Online:2023-07-05 Published:2023-06-30
  • Contact: Hong BAI, Kang NING

微生物组生物合成基因簇发掘方法及应用前景

赖奇龙, 姚帅, 查毓国, 白虹, 宁康   

  1. 华中科技大学生命科学与技术学院,分子生物物理教育部重点实验室,生物信息与分子成像湖北省重点实验室,人工智能生物学研究中心,生物信息与系统生物学系,湖北 武汉 430074
  • 通讯作者: 白虹,宁康
  • 作者简介:赖奇龙(2000—),男,学士 。研究方向为生物信息学,人工智能生物学。 E-mail:laiqilong@hust.edu.cn
    白虹(1978—),女,正高级工程师。研究方向为天然产物化学,微生物学。 E-mail:baihong@hust.edu.cn
    宁康(1979—),男,教授,博士生导师。研究方向为生物信息学,微生物组学,人工智能生物学。 E-mail:ningkang@hust.edu.cn
  • 基金资助:
    国家自然科学基金(32071465);国家重点研发计划(2021YFA0910500)

Abstract:

Biosynthetic gene cluster (BGC) is an important type of gene set, which is commonly found in the genomes of various organisms, and plays important metabolic and regulatory roles. In terms of linear gene structure, the set of genes in a BGC is usually located in close proximity to each other in the genome, but for functions, genes in a BGC usually work synergistically and are responsible for a class of pathways that generate specific small molecules. Therefore, BGCs are vital in synthetic biology research as a highly promising source for elements. However, current BGC databases and analytical platforms are limited by the number and types of experimentally validated BGCs, as well as by the preliminary BGC data mining techniques. The establishment of data-driven systematic discovery of BGCs and their validation, as well as translational studies, are of great value in both fundamental research and practical applications. This article focuses on mining BGCs from big data with microbiome for synthetic biology research. We start with discussing the definition and significance of BGC mining, and summarize current data resources and methods for BGC mining: including MIBiG, antiSMASH and IMG-ABC for artificial intelligence (AI) enabled web services to accelerate BGC mining. Then, we compile a walk-through on how a typical BGC data mining could be conducted, with the history of BGC mining methods highlighted, which underlines the route build-up from traditional machine learning to deep learning. We also diagnose bottlenecks in BGC mining, and propose possible solutions. Furthermore, according to several BGC mining and validation experiments, we demonstrate the profound diversity and breadth of application scenarios with BGC discovery, as well as the importance of combining dry and wet lab experiments for validating newly discovered BGCs. Finally, we envision that the combination of advanced BGC mining methods and synthetic biology could broaden and deepen current synthetic biology research.

Key words: biosynthetic gene cluster, artificial intelligence, synthetic biology, microbiome

摘要:

生物合成基因簇(biosynthetic gene cluster, BGC)是一类非常重要的基因集合(gene set)类型。BGC普遍存在于各类生物基因组中,并且发挥着重要的代谢和调控作用。从线性结构上来说,一个BGC中的基因通常在基因组中处于相邻的位置;从基因功能上来说,一个BGC中的基因通常共同负责一类通路,生成特定的化合物小分子。因此,BGC作为极具潜力的元件来源,在合成生物学研究中极为重要。然而从序列模式上来说,一个BGC中的基因数量众多且序列差异度大,很难通过序列同源性发掘新类型的BGC。因此,建立生物合成基因簇的智能发掘策略,系统性地发掘BGC并进行验证和转化研究,不论在理论方面还是实际应用方面,都具有非常重要的价值。本文主要基于微生物组大数据,较全面地介绍了BGC挖掘的意义和瓶颈问题,系统性地总结了当前BGC发掘中的数据资源和挖掘方法,尤其是人工智能方法,指出了干湿结合方法对于验证新发掘BGC的重要价值,同时展示了新发掘BGC的多样性和广泛应用领域。最后,展望了结合现有BGC挖掘方法和合成生物学转化,将如何在广度和宽度方面扩展目前的合成生物学研究。

关键词: 生物合成基因簇, 人工智能, 合成生物学, 微生物组

CLC Number: