DNA strands further developed as new data storage devices

DNA can last for many centuries. A new DNA storage technology works in the laboratory with parallelizable writing processes and fast reading technology.

listen Print view

So-called epi-bits, epigenetically used derivatives of the DNA nucleobases, can be placed on a DNA strand like letters on paper.

(Image: Arizona State University / Jason Drees)

7 min. read
By
  • Malte Kirchner
Contents

Deoxyribonucleic acid (DNA) not only carries all the genetic information of plants and animals. Data storage in DNA macromolecules could one day solve many problems of long-term archiving. Thanks to the freely variable sequence of the four DNA base pairs involved, a single gram of DNA could encode 17 exabytes of data (17 million terabytes), as scientists at the University of Washington calculated back in 2020. In addition, DNA macromolecules can survive unchanged for centuries or even millennia if they are stored dry and protected from atmospheric oxygen. During this time, they retain their inherent data without requiring energy.

Based on these findings, the DNA Data Storage Alliance (DDSA) aims to develop and standardize a common, universal DNA storage system. In addition to Microsoft and hard disk manufacturer Western Digital, the founding members of this industrial initiative from October 2020 include Illumina as a developer of sequencing devices and Twist Bioscience as an expert in DNA synthesis. To date, however, DNA synthesis, i.e. the step-by-step construction of DNA strands from the four natural nucleic bases adenine, guanine, cytosine and thymine, has proven to be very time-consuming; too time-consuming to encode and then archive very large data sets.

Now researchers at Arizona State University (ASU) in Tempe, in cooperation with international partners such as Laura Na Liu, head of the 2nd Institute of Physics at the University of Stuttgart, have developed a new and faster DNA storage technology. They use universal, prefabricated DNA strands on which they make epigenetic modifications. Epigenetics is the natural method of regulating gene activity by adding or removing chemical groups from DNA. The researchers are adapting this natural mechanism and using it to encode digital information instead of biological instructions. By attaching methyl groups to certain DNA bases, they create so-called epi-bits, molecular data points. A methylated base (epi-bit "1") and an unmodified, non-methylated base (epi-bit "0") are the equivalent of the binary code used in computer technology.

Specifically, the researchers are working with 5-methylcytosine (5mC), a derivative of the DNA nucleic base cytosine. In addition, they not only use a universal single-stranded DNA carrier (ssDNA), but also ssDNA building blocks complementary to short sequences of it. These short building blocks virtually form an entire library. The researchers showed that they can insert any epi-bit combinations with sequences from their building block library and assemble them onto the identical loading sequences of the DNA carriers. They then succeeded in stably modifying bases on the DNA carrier by selective methylation. This writing process not only worked with comparatively high accuracy, but also in parallel on up to 700 different DNA segments.

Specifically, the scientists achieved a writing speed of 350 bits per chemical reaction under laboratory conditions and, to date, 40 bits per second. In this way, they encoded a message from two images – a stylized image of a tiger from the ancient Chinese Han Dynasty and a photo of a panda – with a total of 270,000 bits in less than two hours. Although this time span is still too long for archiving large amounts of data, the new method has the great advantage that it does not require de novo construction of DNA strands, i.e. base pair by base pair from scratch. The researchers are confident that the parallel operation of their method and a yet-to-be-developed industrial technology will further accelerate the writing process. They also point out that, in addition to parallel processing at molecular level, a data memory of the future could also write several DNA strands in parallel.

"In our publication, we only describe the use of 5mC as epi-bits. However, it is conceivable that we could also use other base modifications and thus develop an entire alphabet to describe the DNA strand," Hao Yan, one of the authors, told c't. He is head of the Biodesign Center for Molecular Design and Biomimetics at ASU and currently a visiting professor at the University of Stuttgart. With this extension, the data density on a described DNA strand could be multiplied again.

The researchers used a photo of a panda as a memory object. They used an error-correcting code to reproduce the photo without errors after the writing and reading process in the laboratory.

(Image: Arizona State University)

In addition, like the previous DDSA approach, in which DNA strands are sequentially built up base pair by base pair, the new method generates a permanently storable DNA strand. Like DNA in biological systems, this strand can also be easily copied, which could be interesting for some information dissemination applications.

Although the new DNA storage technology does not manipulate the base sequences in the DNA strand, but currently only changes individual cytosine building blocks according to the epigenetic model, its results can still be read out using normal DNA sequencers. The fast reading technique using nanopore sequencers also works. In this relatively new technology, which has only been used more widely since 2015 thanks to new device technology, the DNA double strand is broken up into individual strands, one of which is passed through a biological channel known as a nanopore.

The actual sequencing is achieved by applying an electrical voltage to the nanopore. When the nanopore is tunneled, each of the four nucleotides (one half of the base pairs originally complete in the DNA strand) leaves a specific pattern in the ion flow. From this, the original base sequence can be deduced in real time. The epigenetically methylated cytosine is also specifically recognizable with this sequencing technique.

Even though DNA memories are initially designed for use with existing computer systems, the researchers see a further development direction. In future applications, it is conceivable to combine DNA memories with molecular computer systems so that data can be stored, processed and even calculated in the same medium. This would transform DNA from a pure storage molecule into an active participant in data processing. In the distant future, so-called bioinformatics could seamlessly combine data storage with biological functions.

First of all, however, the Epi-Bit system is simply designed as a digital data storage device. "You can think of it as an external hard disk for high-density, long-term data storage," Hao explains to c't. For practical applications, however, the speed still needs to increase. Even old USB disks achieve speeds a billion times faster, while modern SSDs are a whole order of magnitude faster. (agr)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.