Tag Archives: biochemistry

Encoding Literature in DNA

In a couple of decades from now, your version of the Bible or Harry Potter (or the best-selling book of the 22nd century, whatever that might be) might just be stored in a small vial of liquid or on small chips. Harvard University researchers have just encoded a book in DNA fragments instead of on physical copy or e-copy.

The Alphabets of DNA

DNA is made up of building blocks called nucleotides, similar to how the English alphabet is made up of building blocks called alphabets. In the language of DNA, there are just 4 alphabets instead of 26, ‘A’, ‘T’, ‘G’ and ‘C’. Moving to information theory, each letter in DNA can thus encode 2 bits of information. Each nucleotide weighs around 250 Dalton (each Dalton weighs 1.66×10-24g). Thus, a single gram of single-stranded DNA could encode 455 exabytes (1 exabyte is 1018 bytes) of information. The previous sentence says ‘single-stranded’ because in nature, DNA molecules form two strands that wrap around each other to form a helix. Even keeping in mind this condition, a single gram of double stranded DNA could still encode around 225 exabytes, not a small number!

The four building blocks of DNA, also called bases (shown in green, red, yellow and blue) can be used as effective storage devices. [Image Credit: restlessmindboosters]

Translating English into DNAese

Encoding a book in DNA essentially means translating English into a code in a language of 4 letters. An html-encoded book called ‘Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves’ containing 53,426 words, 11 JPG images and a Javascript program has not been translated into the language of DNA. While DNA-encoding has been done on smaller scales before, the increasing ease and decreasing cost of DNA sequencing has made it possible to encode larger quantities of text in this biological molecule.

The entire book was translated onto small fragments of DNA called oligonucleotides. Each of these fragments had information from the book, and a small block with information for the ‘address’, where in the book the block belonged to. Thus, a ‘library’ of oligonucleotides is created on a DNA microchip. To ‘read’ the book, this library has to be amplified and sequenced using molecular approaches. These researchers encoded just one bit of information per DNA base instead of the maximum two, made multiple copies of the same oligonucleotide fragment so that errors could be accounted for, and still obtained a whopping density of 5.5 petabits (1015 bits) per millimeter cube.

Current costs of sequencing make this technology prohibitive.  However, the costs of DNA synthesis and sequencing are decreasing exponentially every year, making this a feasible storage molecule for the future. DNA is also stable at room temperature meaning it can be preserved for long periods. While DNA storage and retrieval is slower compared to other methods, its scale offers huge potential. It could thus be used in applications involving archival storage of massive amounts of data.

You can read more about this research here.