Synthetic Biology Journal ›› 2021, Vol. 2 ›› Issue (3): 384-398.DOI: 10.12211/2096-8280.2020-085

• Invited Review • Previous Articles     Next Articles

The pivotal biochemical methods in DNA data storage

Yanmin GAO, Mengtong TANG, Qian LIU, Hongyan QIAO, Taoxue WANG, Hao QI   

  1. Key Laboratory of Systems Bioengineering (Ministry of Education),School of Chemical Engineering and Technology,Tianjin University,Tianjin 300350,China
  • Received:2020-11-30 Revised:2021-02-08 Online:2021-07-13 Published:2021-06-30
  • Contact: Hao QI

DNA信息存储中关键生化方法的研究

郜艳敏, 唐梦童, 刘倩, 乔宏艳, 王桃雪, 齐浩   

  1. 天津大学化工学院,系统生物工程教育部重点实验室,天津 300350
  • 通讯作者: 齐浩
  • 作者简介:郜艳敏(1990—),女,博士研究生。研究方向为DNA信息存储和核酸检测。E-mail:xiaomingao@tju.edu.cn
    唐梦童(1995—),女,硕士研究生。研究方向为DNA信息存储。E-mail:tangmengtong126@126.com
    齐浩(1978—),男,教授,博士生导师。研究方向为合成生物学和分子生物学。E-mail:haoq@tju.edu.cn
  • 基金资助:
    国家重点研发计划“变革性技术关键科学问题”重点专项(2020YFA0712104);国家自然科学基金(21778039)

Abstract:

With the rapid progress in biotechnology, especially array-based DNA synthesis and Next Generation Sequencing (NGS), DNA demonstrated its great advantage in data storage capacity, storage stability and repeatable reading. However, there is still vast challenge regarding current biochemical methods used in manipulation of the large-scale oligonucleotide (oligo) pool carrying digital information. For example, DNA integrity and stability are affected by preservation conditions, such as temperature and humidity. The dropout and mutation (substitute, insertion, or deletion) of DNA oligo have been enlarged in biased manipulations including chemical synthesis, amplification (PCR) and NGS. Large unevenness of the oligo copy number lead to require more sequencing resource to recover all necessary strands in the pool. In addition, missing sequences and base error increase the cost of decoding process. Therefore, DNA data storage is still confined in the laboratory. From the perspective of the biochemical methods for manipulating large-scale oligo pool, we have summarized the causes of biochemical problems such as heterogeneity of oligo copy number, mutation, and DNA decay in the process of microarray DNA synthesis, storage and amplification. And we have summed up a series of biochemical methods developed to address these problems, from oligo synthesis to amplification. These methods include improved synthesis process, adjusted chemical process parameters, modified oligo pool normalization method, optimized PCR condition, variant PCR (emulsion PCR) and novel isothermal amplification (strand displacement amplification). In addition, some measures should be taken in the encoding strategy to mitigate the oligo copy unevenness and aid the error correction. Moreover, we have proved the feasibility and efficiency of these biochemical methods in reducing the abovementioned problems in DNA data storage. Finally, we have discussed and analyzed the challenges in the existing DNA data storage. With the development of biotechnology and strategies of encoding and decoding, we believe that these bottle-neck issues will be solved and DNA data storage will be applied in real-world application in the near future.

Key words: DNA data storage, array-based DNA synthesis, oligo copy unevenness, oligo pool normalization, amplification bias, PCR, isothermal amplification

摘要:

随着生物技术特别是高通量的DNA合成和测序技术的发展,DNA信息存储技术在存储容量、稳定性以及可重复读取等方面都取得了重大成就。但目前携带有数据信息的大规模寡核苷酸文库的生化技术的操纵仍面临着巨大的挑战,比如合成、扩增、保存或测序过程引起的寡核苷酸不均一性、DNA序列丢失、碱基突变、DNA分子的衰减等,这些因素限制了DNA信息存储走向实际工业化应用。本文从操纵大型寡核苷酸文库的生化技术角度出发,归纳总结了造成生化问题的原因以及为解决这些问题所开发的一系列方法,包括合成工艺的改进、寡核苷酸文库的均一化、多种DNA保存方法、变体PCR以及恒温扩增反应,论证了它们在DNA信息存储流程中避免上述生化问题的可行性与有效性,最后分析了现阶段DNA信息存储所面临的合成、长时间保存方法以及扩增等方面的挑战。本文旨在为实现对大型寡核苷酸文库的操纵奠定基础,以期促进DNA信息存储迈向实际应用。

关键词: DNA信息存储, 芯片合成, 寡核苷酸不均一性, 文库均一化, 扩增偏好性, PCR, 恒温扩增

CLC Number: