Loading...

Table of Content

    30 June 2021, Volume 2 Issue 3
    Current contents in Chinese and English
    2021, 2(3):  0. 
    Asbtract ( 384 )   PDF (506KB) ( 533 )  
    Related Articles | Metrics
    Comment
    The current advance in artificial chromosome based DNA information storage
    Yang YANG, Chunhai FAN
    2021, 2(3):  305-308.  doi:10.12211/2096-8280.2021-043
    Asbtract ( 1603 )   HTML ( 265)   PDF (1372KB) ( 1168 )  
    Figures and Tables | References | Related Articles | Metrics

    DNA offers an alternative material for digital storage with high information density and long-term stability. A de novo design and synthesis of an artificial chromosome that encodes two pictures and a video clip has recently been achieved by Prof. Yingjin Yuan's group from Tianjin University, Tianjin, China. This study establishes a data storage strategy using encoded artificial chromosomes via invivo assembly. It allows multiple retrievals from a onetime written and stable replication, which advances in the economically massive data distribution.

    Invited Review
    DNA information storage: bridging biological and digital world
    Mingzhe HAN, Weigang CHEN, Lifu SONG, Bingzhi LI, Yingjin YUAN
    2021, 2(3):  309-322.  doi:10.12211/2096-8280.2021-001
    Asbtract ( 6157 )   HTML ( 566)   PDF (3021KB) ( 3319 )  
    References | Related Articles | Metrics

    The external preservation of information enables reliable inheritance of human thoughts, playing important roles in the progress of human civilization. Starting from tying knots in ropes to storing data in magnetic and optical media, these technologies have documented and will continue to record the splendid civilization. However, driven by the global digitalization, the global data volume is growing rapidly and challenging the storage capability of existing storage technologies. DNA, as the natural carrier of genetic information, is believed to be a potential candidate to deal with the data storage challenge due to the revealed high density, long-term duration and low maintaining cost features. In this review, we first describe the fundamental principles and technical processes of DNA information storage. The pivotal position of DNA information storage bridging the biological and digital world is also pointed out. Then, according to the different characteristics of data writing and reading, we categorize these technologies into three storage modes, termed as "DNA hard drive", "DNA compact disc" and "DNA tape", by analogy with the popular storage media correspondingly. "DNA hard drive" mode shows the potential in the volume enlargement of the existing information storage system using oligonucleotide pools. "DNA compact disc" mode provides direct in vivo processing on DNA data storage enabling massive data distribution at low cost. "DNA tape" mode provides intracellular information recoding solutions, which may promote the future developments of cellular computing and communication. The up-to-date progress of these three modes is also summarized. We then discuss the main obstacles and potential technical routes towards practical applications of DNA information storage. We envision a cheaper, faster DNA information storage technology, and its appropriate integration with information storage systems in the future. Finally, we conclude that DNA information storage is a cutting-edge interdisciplinary technology and hope this review can bring more focus and research efforts from various fields to DNA information storage.

    Research progress on DNA molecules for digital information storage
    Yiming DONG, Fajia SUN, Ruijun WU, Long QIAN
    2021, 2(3):  323-334.  doi:10.12211/2096-8280.2020-086
    Asbtract ( 1719 )   HTML ( 201)   PDF (1443KB) ( 1635 )  
    Figures and Tables | References | Related Articles | Metrics

    With the development of information technology, the approach of digital information storage has gone through unprecedented changes. Traditional storage media such as magnetic and optical devices have gradually fallen short to satisfy the global need for data storage, which calls for storage media with more effective data storage. The extraordinary stability, storage capacity, and storage density of DNA molecules promise it to become a novel information storage medium. In this review, we first introduce the basic principles and processes of using DNA molecules to store artificial information, and highlight the latest research results of DNA storage during the past few years. Next, we compare DNA molecules with current mainstream data storage media in terms of performance and cost. DNA molecules excel in data storage density, storage life, maintenance cost and its potential involvement with living cells. Finally, we provide a detailed review of factors that curb the development of DNA information storage, such as data security, writing and reading speeds and storage cost. Meanwhile, we briefly give comments on emerging biotechnological areas that potentially bring breakthroughs to the field of DNA storage, such as DNA barcoding and DNA origami. Perceivably, information storage with DNA molecules provide a novel solution to cold data storage. However, we would not refrain from the optimistic conjecture that multidisciplinary principles and techniques will continuously expand the application scenarios for DNA information storage.

    DNA synthesis technology: foundation of DNA data storage
    Xiaoluo HUANG, Junbiao DAI
    2021, 2(3):  335-353.  doi:10.12211/2096-8280.2020-088
    Asbtract ( 2292 )   HTML ( 263)   PDF (1954KB) ( 2403 )  
    Figures and Tables | References | Related Articles | Metrics

    DNA-based data storage technology has many considerable advantages, and been suggested as one of the most promising technologies to cope up with future crisis in information storage. It involves the conversion of real information into A/T/C/G sequences, synthesis of preservable DNA polymers by DNA synthesis technology, and data deciphering by DNA sequencing technology. Nevertheless, the current cost of DNA synthesis is still high, which greatly limits the rapid development and industrial application of DNA data storage. As the key technology of DNA data storage, DNA synthesis lays the foundation for the practical application of DNA data storage. Since the first oligonucleotides were made in the 1950 s, DNA synthesis technology has been rapidly developed and commercialized, spawning the emergence of DNA synthesizers with different throughput, which achieved oligonucleotides synthesis with dozens of nucleotides to MB-level microbial genomes. In this review, we systematically summarized the key research progress of DNA synthesis technology in terms of its historical development, which includes column-based chemical oligonucleotide synthesis, chip-based chemical oligonucleotide synthesis, oligonucleotide purification, oligonucleotide assembly, error correction and gene cloning, large fragment gene synthesis, genome synthesis and next generation enzymatic DNA synthesis. Currently, the widely used DNA synthesis technology starts from chemical synthesis of oligonucleotide. Although a number of chemical technologies have been proposed, the one typically used is the "phosphoramide" method, which includes the steps of "deprotection","coupling","capping" and "oxidation". The chemical synthesis generally produces single-stranded oligonucleotide with less than 200 nt. For double-stranded DNA synthesis, the single-stranded oligonucleotides need to be assembled. The oligonucleotide assembly technologies including ligase chain assembly (LCA) and polymerase chain assembly (PCA) were thus developed, and have been well applied in the commercialized gene synthesis. Following the development of chemical oligonucleotide synthesis technology and gene synthesis technology, several bacterial genomes and yeast chromosomes have been successfully synthesized, by employing the strategies of "one time de novo synthesis" or "gradual replacement synthesis". Meanwhile, new enzymatic DNA synthesis technology has also made considerable progress in the recent years, opening up a new path for synthetic biologists. In addition to these key research developments, we further summarized and analyzed the impact of key parameters of DNA synthesis technology, such as length, cost and speed, on DNA data storage, in order to provide some references and ideas for the development and the practical application of the entire DNA data storage process. Finally, we envisioned the future trend of DNA synthesis technology, including cost reduction, further development of genome synthesis technology and enzymatic DNA synthesis technology, as well as the establishment of a faster DNA synthesis technology with a longer fragment and lower-cost for DNA data storage.

    In situ chemical synthesis of DNA microarrays
    Han YAN, Pengfeng XIAO, Quanjun LIU, Zuhong LU
    2021, 2(3):  354-370.  doi:10.12211/2096-8280.2020-089
    Asbtract ( 1546 )   HTML ( 145)   PDF (3109KB) ( 1603 )  
    References | Related Articles | Metrics

    High-throughput, rapid and low-cost DNA synthesis is an important core technology in the research fields such as synthetic biology, DNA storage and DNA chips. The in-situ chemical synthesis of DNA microarrays is based on the principle of solid-phase chemical synthesis of phosphorous acid amides, and integrates the relevant technologies of microelectronics, computational science, molecular biology, photo-electrochemistry and micro-nano processing. In the past 30 years, the technology has developed rapidly. In-situ chemical synthesis methods can be grouped into photolithography, photo-acid methods, electro-acid methods, printing methods and imprinting methods, etc. based on their base allocation strategy. In this paper, we discuss the different in-situ chemical synthesis methods of DNA microarrays and their technical characteristics, as well as the potential development trends of DNA synthesis methods in the future. We believe that in terms of synthesis throughput and efficiency, the CMOS (Complementary Metal Oxide Semiconductor) chip-based in-situ chemical synthesis of electro-acid DNA has tremendous potential in the next decade. By solving the problem of hydrogen ion crosstalk between microelectrodes on the chip, it is expected that the rapid and low-cost synthesis of TB-level DNA can be accomplished on a single chip. As shown in the schematic diagram, the voltage of different areas of the DNA synthesis chip is controlled by computer to selectively deprotect the bases at different sites to achieve high-throughput parallel synthesis of different sequences of DNA. However, the bottleneck comes from the crosstalk between micro-electrodes due to acidic ion diffusion inside CMOS chip, which results in the limitation of synthetic capacity at the hundreds-of-megabyte level. After studying the behavior of hydrogen ion transport at the micrometer scale, the redesigning of CMOS chip with new materials and structures is suggested, aiming to suppress the ion diffusion between micro-electrodes and optimize the structure of microelectrodes and microfluidic channels of the chip and device. It is possible that the synthetic capacity of single chip could reach terabyte-level for CMOS-based in situ parallel DNA chemical synthesis with electro-acidified deprotection in the future, which will significantly accelerate the practical applications of synthetic biology and related technologies.

    DNA data storage: preservation approach and data encryption
    Tingyao ZHOU, Yuan LUO, Xingyu JIANG
    2021, 2(3):  371-383.  doi:10.12211/2096-8280.2020-084
    Asbtract ( 1610 )   HTML ( 159)   PDF (1994KB) ( 1872 )  
    Figures and Tables | References | Related Articles | Metrics

    With the rapid development of information technology and the Internet, human society has entered a new big data era. According to the global DataSphere by international data corporation (IDC), more than 5.9×1022 bytes of data will be created and consumed within 2020, and a 26% data growth rate will be sustained through 2024. These data have outpaced the existing storage formats' capability, including magnetic, optical and electronic media. To alleviate the growing gap between explosive data production and current storage capability, it is highly desirable to explore novel solutions for data storage. As an emerging data storage medium, DNA offers substantial advantages over conventional media, including ultra-high data storage density (theoretically 106 times higher than existing technology), low energy consumption, and long lifetime (up to several hundred thousand years in theory), and has great potential applications in the future. In this review, we present the basic theory and the workflow of DNA data storage, including encode, write, store and encrypt, random access, read, and decode. We also discuss the research progress on data retention strategies, highlighting in vitro and in vivo DNA storage. In comparison with in vivo strategy, in vitro storage may have the greatest potential for applications in consideration of cost, durability, and scalability. We briefly summarize the latest research about information security and data encryption using DNA. Finally, we discuss the current challenges and emerging trends in DNA data storage. The cost of DNA synthesis and sequencing largely restricts the rapid development of DNA data storage. Further unresolved questions include efficient preservation and feasible random access. To solve the challenges, improving the efficiency of DNA data storage, storage, and reading automation and new strategies for data encryption will be important research directions for DNA data storage. It is believed that with the continuous development of synthetic biology, DNA data storage will become the most promising form for information storage in the future.

    The pivotal biochemical methods in DNA data storage
    Yanmin GAO, Mengtong TANG, Qian LIU, Hongyan QIAO, Taoxue WANG, Hao QI
    2021, 2(3):  384-398.  doi:10.12211/2096-8280.2020-085
    Asbtract ( 1170 )   HTML ( 91)   PDF (2138KB) ( 1121 )  
    References | Related Articles | Metrics

    With the rapid progress in biotechnology, especially array-based DNA synthesis and Next Generation Sequencing (NGS), DNA demonstrated its great advantage in data storage capacity, storage stability and repeatable reading. However, there is still vast challenge regarding current biochemical methods used in manipulation of the large-scale oligonucleotide (oligo) pool carrying digital information. For example, DNA integrity and stability are affected by preservation conditions, such as temperature and humidity. The dropout and mutation (substitute, insertion, or deletion) of DNA oligo have been enlarged in biased manipulations including chemical synthesis, amplification (PCR) and NGS. Large unevenness of the oligo copy number lead to require more sequencing resource to recover all necessary strands in the pool. In addition, missing sequences and base error increase the cost of decoding process. Therefore, DNA data storage is still confined in the laboratory. From the perspective of the biochemical methods for manipulating large-scale oligo pool, we have summarized the causes of biochemical problems such as heterogeneity of oligo copy number, mutation, and DNA decay in the process of microarray DNA synthesis, storage and amplification. And we have summed up a series of biochemical methods developed to address these problems, from oligo synthesis to amplification. These methods include improved synthesis process, adjusted chemical process parameters, modified oligo pool normalization method, optimized PCR condition, variant PCR (emulsion PCR) and novel isothermal amplification (strand displacement amplification). In addition, some measures should be taken in the encoding strategy to mitigate the oligo copy unevenness and aid the error correction. Moreover, we have proved the feasibility and efficiency of these biochemical methods in reducing the abovementioned problems in DNA data storage. Finally, we have discussed and analyzed the challenges in the existing DNA data storage. With the development of biotechnology and strategies of encoding and decoding, we believe that these bottle-neck issues will be solved and DNA data storage will be applied in real-world application in the near future.

    A global patent analysis: trends in DNA synthesis and information storage
    Daming CHEN, Xuebo ZHANG, Xiao LIU, Yue MA, Yan XIONG
    2021, 2(3):  399-411.  doi:10.12211/2096-8280.2021-040
    Asbtract ( 1410 )   HTML ( 137)   PDF (2085KB) ( 1167 )  
    References | Related Articles | Metrics

    With the rapid growth of digital data production, there is a strong motivation for developing new data storage media. DNA, as a medium for data storage, has attracted the attention of many research institutions and enterprises. DNA based storage technology is expeditiously evolving and has great market potential. It is predicted that, by 2024, about 30% of digital businesses will begin to try to use DNA for data storage. To meet the huge market demand, a matching patent scheme has become extraordinarily important for winning the initiative in the competition. This article analyzes the development trend of DNA synthesis and storage technology from the perspective of patent analysis. We have searched and obtained 1833 patents related to DNA synthesis and storage on a global scale (excluding gene sequencing patents required for DNA storage technology, also excluding DNA synthesis patents dedicated to diagnosis, treatment and other applications) by comprehensively using keywords, international patent classification, patentee, inventor search and other methods. Based on individual reading and comparison, we have utilized patent value analysis, citation analysis, cluster analysis, technical efficacy analysis and other methods to select representative patents that are expected to provide references for further investigations, patent layout and operation decisions in this field. To achieve the goal of using DNA to store digital information, the synthesis of oligonucleotides or polynucleotides is the basis of "writing". So far, the global oligonucleotide or polynucleotide synthesis technology has experienced three generations of development. The first and second generations use phosphoramidite chemical synthesis method, while the third generation is based on the principle of enzymatic synthesis. Global patent analysis has revealed that the number of patents published each year greatly increases while the synthesis technology of oligonucleotides or polynucleotides evolves to phosphoramidite chemical with combination of microarray-based chip technology. Many established companies have joined the development of the second generation of synthetic technology. The emergence of the third generation synthesis technology, which is based on the use of polymerases such as terminal deoxyribonucleoside transferase, is also reflected by the number of patents. Meanwhile, techniques required for DNA based storage, such as nucleic acid assembly, sequence design, and information storage, are also rapidly developing. The patentees corresponding to these technologies have shown obvious cross-industry integration features, since companies such as Microsoft, Intel, and Huawei have successively joined the competition and cooperation of DNA storage patents. It is foreseeable that in the future DNA based storage technology will inevitably involve the convergence of technologies in multiple fields such as Chemistry, Materials, Biology, Informatics, Mechanics, and Electronics. The further integration of these technologies would promote a regularly upgrading development path similar to "Moore's Law" in the field of DNA storage. Based on high-throughput, high-efficiency, high-fidelity and low-cost DNA synthesis, using comprehensive information encoding and decoding, integrating "edit"-"write"-"read"-"dissolve" functions, DNA storage system in the future will become a truly "usable" solution.

    Research Article
    Chamaeleo: an integrated evaluation platform for DNA storage
    Zhi PING, Haoling ZHANG, Shihong CHEN, Ming NI, Xun XU, Sha ZHU, Yue SHEN
    2021, 2(3):  412-427.  doi:10.12211/2096-8280.2020-083
    Asbtract ( 1603 )   HTML ( 134)   PDF (3873KB) ( 1206 )  
    References | Related Articles | Metrics

    The emerging field of DNA based data storage has attracted considerable interests for the enormous potentials of DNA in high density and durability as a medium. Compare to traditional storage material such as magnetic, optical and electronic storage media, the use of DNA as storage media has been considered as a promising novel solution to meet the global demand for storing the skyrocketing amount of data worldwide. In addition, DNA storage adds an extra layer of protection for the stored information because the coding and decoding process of DNA based data storage relies on the combined implementation of DNA synthesis and sequencing technologies, which are not as commonly used as technologies in information communication area. Transcoding between binary digital data and quaternary DNA molecules is the most important step in the whole process of DNA-based data storage. Several coding methods have been developed using different programming languages in the past decades, however, it is difficult to compare the overall performance of these methods due to different software architectures and varying parameters. Thus, it brings challenges for researchers to further develop or for users to compare and choose the suitable methods as needed. In this study, we introduce an integrated evaluation platform "Chamaeleo" to address the issues as stated above. One of the key features of Chamaeleo is the integration of existing coding schemes and modulization of functions including data handling, transcoding, index operating and error-correcting as a user-friendly design. The other key feature is the function of evaluating a coding scheme in a qualitative and quantitative manner. A set of widely recognized and accepted indexes are chosen to evaluate the compatibility with DNA writing and reading technologies, the robustness regarding tolerance of introduced errors or data loss and the complexity of transcoding rules. Considering the rapid advancement in this field, Chamaeleo is designed as an open-source style for researchers to incorporate new coding schemes and evaluation indexes into the platform, thus encouraging the community to contribute together in the shaping of future DNA based data storage.

    Multiple interleaved RS codes for data storage using up to Mb-scale synthetic DNA in living cells
    Weigang CHEN, Qi GE, Panpan WANG, Mingzhe HAN, Jian GUO
    2021, 2(3):  428-443.  doi:10.12211/2096-8280.2021-023
    Asbtract ( 1025 )   HTML ( 111)   PDF (3400KB) ( 1220 )  
    References | Related Articles | Metrics

    The synthetic DNA, as a potential digital data storage medium, has a high storage density and can be used for a very long period. It is expected to serve as an important option for future massive data storage. However, the synthesis, assembly and sequencing of DNA often introduce multiple types of base errors, which does not satisfy the reliability requirements of data storage, while reliability-enhanced coding schemes usually sacrifice the logical coding density by adding redundancy. To deal with this problem, an encoding process for DNA data storage using large synthetic DNA fragments in Saccharomyces Cerevisiae was proposed. Data writing into DNA chunks was constructed by interleaving multiple codewords of Reed Solomon (RS) codes with a very high code rate, embedded with autonomous replication sequences (ARSs) in alternation to form a yeast artificial chromosome. Utilizing the high-throughput sequencing, data readout combines short read assembly with the de Bruijn graphs, ARS guided contig combination and erasure/error correction to achieve reliable data recovery. The error correction capability has been fully exploited by interleaving the large missing fractions into random erasures across all the RS codewords and correcting more erasures than errors. We designed and simulated a 2.5 Mb ring chromosome and successfully recovered the original data from 20× high-throughput sequencing reads. The simulated sequencing data are generated using the ART simulation software, which has been trained using the real sequencing data from an artificial chromosome of 254 886 bp constructed for data storage previously. All the processes including the large DNA chunk assembly, DNA replication, extraction and high-throughput sequencing are viewed as the DNA storage channel in information theory community. We provided an efficient encoding scheme matching the codes and the DNA storage channel based on the information theory paradigm. The logical density of the data DNA chunks was 1.973 bit/bp, and the overall logical density still reached up to 1.947 bit/bp including the biological units (ARSs and vector backbones). The demonstrated design process can support DNA coding schemes with the different lengths from Kb up to Mb, which provides flexible verification and support for wet experiments in the synthesis and sequencing of large fragments of DNA for digital data storage.