合成生物学 ›› 2021, Vol. 2 ›› Issue (3): 428-443.DOI: 10.12211/2096-8280.2021-023
• 研究论文 • 上一篇
陈为刚1,2, 葛奇1, 王盼盼1, 韩明哲2,3, 郭健1
收稿日期:
2021-02-09
修回日期:
2021-03-28
出版日期:
2021-06-30
发布日期:
2021-07-14
通讯作者:
陈为刚
作者简介:
Weigang CHEN1,2, Qi GE1, Panpan WANG1, Mingzhe HAN2,3, Jian GUO1
Received:
2021-02-09
Revised:
2021-03-28
Online:
2021-06-30
Published:
2021-07-14
Contact:
Weigang CHEN
摘要:
合成DNA作为潜在的数字信息存储介质,存储密度高,可用时间久,有望成为未来数据存储的重要选项。然而,DNA的合成与测序读出往往造成碱基的多种错误,无法满足数据存储的可靠性要求,而保证可靠性的编码方案往往效率较低。针对该问题,提出了一种面向酿酒酵母内大片段DNA数据存储的高效率编码方法。数据编码通过多个极高码率的里德-所罗门(RS)码的码字交织构建数据DNA单元,将其与酵母的自主复制序列(ARS)交替镶嵌,构成酵母人工染色体序列;数据读出时,利用二代高通量测序,组合了读段从头(de novo)组装、ARS导引例,用20×二代测序数据可无错恢复原始数据。该编码方法不仅能实现数据可靠存储,实现的DNA数据部分逻辑密度为1.973 bit/bp,即使考虑生物单元开销,总体逻辑密度仍达到1.947 bit/bp。该设计流程可支持Kb到Mb不同长度的DNA的编码,为大片段DNA数据存储的“湿”实验提供灵活的实验前验证与评估。
中图分类号:
陈为刚, 葛奇, 王盼盼, 韩明哲, 郭健. 细胞内大片段DNA数据存储的多RS码交织编码[J]. 合成生物学, 2021, 2(3): 428-443.
Weigang CHEN, Qi GE, Panpan WANG, Mingzhe HAN, Jian GUO. Multiple interleaved RS codes for data storage using up to Mb-scale synthetic DNA in living cells[J]. Synthetic Biology Journal, 2021, 2(3): 428-443.
1 | CHURCH G M, GAO Y, KOSURI S. Next-generation digital information storage in DNA [J]. Science, 2012, 337(6102): 1628. |
2 | GOLDMAN N, BERTONE P, CHEN S Y, et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA [J]. Nature, 2013, 494(7435): 77-80. |
3 | MEISER L C, ANTKOWIAK P L, KOCH J, et al. Reading and writing digital data in DNA [J]. Nature Protocols, 2020, 15(1): 86-101. |
4 | DONG Y M, SUN F J, PING Z, et al. DNA storage: research landscape and future prospects [J]. National Science Review, 2020, 7(6): 1092-1107. |
5 | PING Z, MA D Z, HUANG X L, et al. Carbon-based archiving: current progress and future prospects of DNA-based data storage [J]. Gigascience, 2019, 8(6): giz075. |
6 | 董一名, 孙法家, 武瑞君, 等. DNA数字信息存储的研究进展[J]. 合成生物学, 2021, 2(3): 323-334. |
DONG Yiming, SUN Fajia, WU Ruijun, et al. Research progress on DNA molecules for digital information storage [J]. Synthetic Biology Journal, 2021, 2(3): 323-334. | |
7 | 韩明哲,陈为刚, 宋理富, 等. DNA信息存储:生命系统与信息系统的桥梁 [J]. 合成生物学, 2021.2(3): 309-322. |
HAN Mingzhe, CHEN Weigang, SONG Lifu,et al. DNA information storage:bridging biological and digital world [J]. Synthetic Biology Journal, 2021, 2(3):309-322. | |
8 | Semiconductor Industry Association (ISA), Semiconductor Research Corporation (SRC). The decadal plan for semiconductors [R/OL]. 2021. . |
9 | BORNHOLT J, LOPEZ R, CARMEAN D M, et al. Toward a DNA-based archival storage system [J]. IEEE Micro, 2017, 37(3): 98-104. |
10 | ERLICH Y, ZIELINSKI D. DNA fountain enables a robust and efficient storage architecture [J]. Science, 2017, 355(6328): 950-954. |
11 | YAZDI TABATABAEI HOSSEIN S M, GABRYS R, MILENKOVIC O. Portable and error-free DNA-based data storage [J]. Scientific Reports, 2017, 7(1): 5011. |
12 | ORGANICK L, ANG S D, CHEN Y J, et al. Random access in large-scale DNA data storage [J]. Nature Biotechnology, 2018, 36(3): 242-248. |
13 | GRASS R N, HECKEL R, PUDDU M, et al. Robust chemical preservation of digital information on DNA in silica with error-correcting codes [J]. Angewandte Chemie International Edition, 2015, 54(8): 2552-2555. |
14 | BLAWAT M, GAEDKE K, HUETTER I, et al. Forward error correction for DNA data storage [J]. Procedia Computer Science, 2016, 80: 1011-1022. |
15 | 陈为刚, 黄刚, 李炳志, 等. 音视频文件的DNA信息存储 [J]. 中国科学: 生命科学, 2020, 50(1): 81-85. |
CHEN Weigang, HUANG Gang, LI Bingzhi, et al. DNA information storage for audio and video files [J]. SCIENTIA SINICA Vitae, 2020, 50(1): 81-85. | |
16 | PING Z, CHEN S, HUANG X, et al. Towards practical and robust DNA-based data archiving by codec system named Yin-Yang [EB/OL]. [2021-05-27]. . |
17 | PRESS W H, HAWKINS J A, JONES S K, et al. HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints [J]. Proceedings of the National Academy of Science of United States of America, 2020, 117(31): 18489-18496. |
18 | KOSURI S, CHURCH G M. Large-scale de novo DNA synthesis: technologies and applications [J]. Nature Methods, 2014, 11(5): 499-507. |
19 | CHEN W G, HAN M Z, ZHOU J T, et al. An artificial chromosome for data storage, [J] National Science Review, 2021, 8(5): nwab028. |
20 | DAVIS J. Microvenus [J]. Art Journal, 1996, 55(1): 70-74. |
21 | SHIPMAN S L, NIVALA J, MACKLIS J D, et al. CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria [J]. Nature, 2017, 547(7663): 345-349. |
22 | HAO M, QIAO H Y, GAO Y, et al. A mixed culture of bacterial cells enables an economic DNA storage on a large scale [J]. Communications Biology, 2020, 3(1): 416-424. |
23 | NGUYEN H H, PARK J, PARK S J, et al. Long-term stability and integrity of plasmid-based DNA data storage [J]. Polymers, 2018, 10(1): 28. |
24 | YIM S S, MCBEE R M, SONG A M, et al. Robust direct digital-to-biological data storage in living cells [J]. Nature Chemical Biology, 2021,17(3):246-253. |
25 | POSTMA E D, DASHKO S, BREEMEN L V, et al. A supernumerary designer chromosome for modular in vivo pathway assembly in Saccharomyces cerevisiae [J]. Nucleic Acids Research, 2021, 49(3): 1769-1783. |
26 | SONG L F, ZENG A P. Orthogonal information encoding in living cells with high error-tolerance, safety, and fidelity [J]. ACS Synthetic Biology, 2018, 7(3): 866-874. |
27 | GIBSON D G, GLASS J I, LARTIGUE C, et al. Creation of a bacterial cell controlled by a chemically synthesized genome [J]. Science, 2010, 329(5987): 52-56. |
28 | WU Y, LI B Z, ZHAO M, et al. Bug mapping and fitness testing of chemically synthesized chromosome X [J]. Science, 2017, 355(6329): eaaf4706. |
29 | XIE Z X, LI B Z, MITCHELL L A, et al. "Perfect" designer chromosome V and behavior of a ring derivative [J]. Science, 2017, 355(6329): eaaf4704. |
30 | SHEN Y, WANG Y, CHEN T, et al. Deep functional analysis of synII, a 770-kilobase synthetic yeast chromosome [J]. Science, 2017, 355(6329): eaaf4791. |
31 | FREDENS J, WANG K H, TORRE D D L, et al. Total synthesis of Escherichia coli with a recoded genome [J]. Nature, 2019, 569(7757): 514-518. |
32 | 丁明珠, 李炳志, 王颖, 等. 合成生物学重要研究方向进展[J]. 合成生物学, 2020, 1(1): 7-28. |
DING Mingzhu, LI Bingzhi, WANG Ying, et al. Significant research progress in synthetic biology [J]. Synthetic Biology Journal, 2020, 1(1): 7-28. | |
33 | 彭凯, 逯晓云, 程健, 等. DNA合成、组装与纠错技术研究进展[J]. 合成生物学, 2020, 1(6): 697-708. |
PENG Kai, LU Xiaoyun, CHENG Jian, et al. Advances in technologies for de novo DNA synthesis,assembly and error correction [J]. Synthetic Biology Journal, 2020, 1(6): 697-708. | |
34 | 刘晓, 王慧媛, 熊燕, 等. 基因合成与基因组编辑[J]. 中国细胞生物学学报, 2019, 41(11): 2072-2082. |
LIU Xiao, WANG Huiyuan, XIONG Yan, et al. Progress in gene synthesis and genome editing [J]. Chinese Journal of Cell Biology, 2019, 41(11): 2072-2082. | |
35 | 罗周卿, 戴俊彪. 合成基因组学:设计与合成的艺术[J]. 生物工程学报, 2017, 33(3): 331-342. |
LUO Zhouqing, DAI Junbiao. Synthetic genomics: the art of design and synthesis [J]. Chinese Journal of Biotechnology2017, 33(3): 331-342. | |
36 | 王会, 戴俊彪, 罗周卿. 基因组的"读-改-写"技术[J]. 合成生物学, 2020, 1(5): 503-515. |
WANG Hui, DAI Junbiao, LUO Zhouqing. Reading, editing, and writing techniques for genome research [J]. Synthetic Biology Journal, 2020, 1(5): 503-515. | |
37 | KARAS B J, JABLANOVIC J, SUN L J, et al. Direct transfer of whole genomes from bacteria to yeast [J]. Nature Methods, 2013, 10(5): 410-412. |
38 | 朱雪龙. 应用信息论基础[M]. 北京: 清华大学出版社, 2001. |
ZHU X L.Fundamentals of applied information theory[M]. Beijing: Tsinghua University Press, 2001. | |
39 | LIN S, COSTELLO D J. Error control coding [M]. London:Pearson Education Inc, 2004. |
40 | REED I S, SOLOMON G. Polynomial codes over certain finite fields [J]. Journal of the Society for Industrial & Applied Mathematics, 1960, 8(2): 300-304. |
41 | MATSUI H, MITA S. A new encoding and decoding system of Reed-Solomon codes for HDD [J]. IEEE Transactions on Magnetics, 2009, 45(10): 3757-3760. |
42 | RIGGLE C M, MCCARTHY S G. Design of error correction systems for disk drives [J]. IEEE Transactions on Magnetics, 1998, 34(4): 2362-2371. |
43 | LEE Joohyun, LEE Jaejin, PARK T. Error control scheme for high-speed DVD systems [J]. IEEE Transactions on Consumer Electronics, 2005, 51(4): 1197-1203. |
44 | SONG M A, KUO S Y, LAN I F. A low complexity design of Reed Solomon code algorithm for advanced RAID system [J]. IEEE Transactions on Consumer Electronics, 2007, 53(2): 265-273. |
45 | IM S, SHIN D. Flash-Aware RAID techniques for dependable and High-Performance flash memory SSD [J]. Computers IEEE Transactions on Computers, 2011, 60(1): 80-92. |
46 | HUANG J Z, LIANG X H, QIN X, et al. Scale-RS: an efficient scaling scheme for RS-Coded storage clusters [J]. IEEE Transactions on Parallel & Distributed Systems, 2015, 26(6): 1704-1717. |
47 | CHEN W G, WANG T, HAN C C, et al. Erasure-correction-enhanced iterative decoding for LDPC-RS product codes [J]. China Communications, 2021, 18(1): 49-60. |
48 | SIOW C C, NIEDUSZYNSKA S R, MÜLLER C A, et al. OriDB, the DNA replication origin database updated and extended [J]. Nucleic Acids Research, 2012, 40(D1): 682-686. |
49 | LOMAN N J, MISRA R V, DALLMAN T J, et al. Performance comparison of benchtop high-throughput sequencing platforms [J]. Nature Biotechnology, 2012, 30(5): 434-439. |
50 | MARDIS E R. Next-generation DNA sequencing methods [J]. Annual Review of Genomics and Human Genetics, 2008, 9(1): 387-402. |
51 | SHENDURE J, JI H. Next-generation DNA sequencing [J]. Nature Biotechnology, 2008, 26(10): 1135-1145. |
52 | METZKER M L. Sequencing technologies—the next generation [J]. Nature Reviews Genetics, 2010, 11(1): 31-46. |
53 | COMPEAU P E C, PEVZNER P A, TESLER G. How to apply de Bruijn graphs to genome assembly [J]. Nature Biotechnology, 2011, 29(11): 987. |
54 | ZERBINO D R, BIRNEY E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs [J]. Genome Research, 2008, 18(5): 821-829. |
55 | SIMPSON J T, WONG K, JACKMAN S D, et al. ABySS: a parallel assembler for short read sequence data [J]. Genome Research, 2009, 19(6): 1117-1123. |
56 | HUANG W C, LI L P, MYERS J R, et al. ART: a next-generation sequencing read simulator [J]. Bioinformatics, 2012, 28: 593-594. |
57 | CHEN W G, WANG L X, HAN M Z, et al. Sequencing barcode construction and identification methods based on block error-correction codes [J]. Science China Life Sciences, 2020,63(10):1580-1592. |
58 | CHEN W G, WANG P P, WANG L X, et al. Low-complexity and highly robust barcodes for error-rich single molecular sequencing [J]. 3 Biotech, 2021, 11(2): 1-11. |
59 | 格林 M R, 萨姆布鲁克 J. 分子克隆实验指南 [M]. 贺福初, 陈薇, 杨晓明, 译. 4版. 北京: 科学出版社, 2013: 226-227. |
GREEN M R, SAMBROOK J. Molecular cloning: a laboratory manual [M]. HE F C, CHEN W, YANG X M, trans. 4th ed. Beijing: Science Press, 2013: 226-227. | |
60 | DUNNEN J D, GROOTSCHOLTEN P M, DAUWERSE J, et al. Reconstruction of the 2.4 Mb human DMD-gene by homologous YAC recombination [J]. Human Molecular Genetics, 1992, 1(1): 19-28. |
[1] | 刘宽庆, 张以恒. 木质素的生物降解和生物利用[J]. 合成生物学, 2023, (): 1-14. |
[2] | 赵国淼, 杨鑫, 张媛, 王靖, 谭剑, 魏超, 周娜娜, 李凡, 王小艳. 生物设施平台及其工业应用[J]. 合成生物学, 2023, 4(5): 892-903. |
[3] | 卢挥, 张芳丽, 黄磊. 合成生物学自动化装置iBioFoundry的构建与应用[J]. 合成生物学, 2023, 4(5): 877-891. |
[4] | 张志强, 张扬, 邱维宝, 郑海荣. 超声移液及微量移液技术进展和展望[J]. 合成生物学, 2023, 4(5): 916-931. |
[5] | 张晨悦, 马英群, 王兴, 傅容湛, 黄技伟, 花秀夫, 范代娣, 费强. 全碳素生物转化沼气制备生物航煤制造路线研究进展[J]. 合成生物学, 2023, (): 1-13. |
[6] | 马孟丹, 尚梦宇, 刘宇辰. CRISPR-Cas9系统在肿瘤生物学中的应用及前景[J]. 合成生物学, 2023, 4(4): 703-719. |
[7] | 王甜甜, 朱虹, 杨琛. 蓝细菌CRISPRa系统的开发及其代谢工程应用[J]. 合成生物学, 2023, 4(4): 824-839. |
[8] | 孙美莉, 王凯峰, 陆然, 纪晓俊. 解脂耶氏酵母底盘细胞的工程改造及应用[J]. 合成生物学, 2023, 4(4): 779-807. |
[9] | 孙翰, 刘进. 真核微藻脂质代谢工程的研究进展和展望[J]. 合成生物学, 2023, (): 1-21. |
[10] | 孟倩, 尹聪, 黄卫人. 肿瘤类器官及其在合成生物学中的研究进展[J]. 合成生物学, 2023, (): 1-11. |
[11] | 宋斐, 蔡志明, 黄卫人. 合成生物开启生物医药崭新篇章:预防、诊断与治疗[J]. 合成生物学, 2023, 4(2): 241-243. |
[12] | 高纤云, 牛灵雪, 见妮, 管宁子. 微生物合成生物学在疾病诊疗上的应用进展[J]. 合成生物学, 2023, 4(2): 263-282. |
[13] | 刘菱, 郑胜杰, 窦汇溪, 李骁健. 植入式脑机接口在医疗与科研中的作用与应用[J]. 合成生物学, 2023, 4(2): 407-417. |
[14] | 吕海龙, 王建, 吕浩, 王金, 徐勇, 顾大勇. 合成生物学在下一代基因诊断技术中的应用进展[J]. 合成生物学, 2023, 4(2): 318-332. |
[15] | 滕小龙, 史硕博. CRISPR/Cas9系统在基因组编辑中的优化与发展[J]. 合成生物学, 2023, 4(1): 67-85. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||