Archive Forests

What if data could be stored in a truly sustainable manner?

About ten years, our society entered a new era, the data era. Digital information is nothing new but, since the beginning of the last decade, it has become the fuel for increasingly powerful and increasingly autonomous algorithms which govern almost all parts of our daily lives.

The causes of this shift are primarily technological: the performance of microprocessors has continued to increase exponentially, and the emergence of cloud computing has pushed the limits of computing power even further. And this may only be a beginning if advances in quantum computing open up new perspectives.

Unlike oil, which was at the origin of a previous industrial revolution, data is not a fuel in limited supply. It is quite the opposite actually, so as soon as the world understood its value, production accelerated.

Between 2010 and 2019, production was multiplied by 20, and we expect to see the volume triple again by 2024. And rare are the occasions to delete data: for regulatory reasons, or just by omission or by comfort, huge amounts of information are stored on servers or hard drives without ever being consulted – or very rarely. As a result, the volume of information increases much faster than storage capacity. The physical challenge is obvious: solutions must be found to avoid a shortage of storage capacity on a global scale in the medium term.

Volume of Information/Data created in the world
(in zettabytes – source : Statista)

The other global issue is environmental. In 2016, data centers accounted for about 3% of global electricity consumption, and this proportion is expected to increase to about 20% by 2025. In terms of greenhouse gases, data centers accounted for about 2% of emissions in 2016 (i.e. as much as the airline sector) and this share should increase to 14% by 2025. To put this in context, 14% is the share of the US in global emissions today. The environmental impact of artificial intelligence or blockchain is often singled out, and rightly so. But these technologies also hold so many promises that it seems difficult to envisage limiting their development.

In the medium term, there is a real need to develop alternative technologies allowing both to store a greater quantity of information in the same physical volume, but also to reduce the environmental footprint.

Traditional storage methods include magnetic media (cassettes, floppy disks, hard drives), optical storage (CD, DVD, Blu-Ray), and flash storage (USB flash drive, SD cards, SSD). Putting aside the environmental dimension, the attractiveness of any data storage medium is measured by the following characteristics:

  • Density: bits per unit
  • Retention: duration during which the data can be recovered without loss
  • The energy cost of information, both at rest and when accessed
  • Speed of access: time and bandwidth required to retrieve the information

Given recent advances in terms of DNA synthesis and sequencing, one of the avenues to explore is in vivo molecular storage, in particular for long-term archiving where speed of access is secondary to density, retention and energy cost.

Molecular storage consists of using molecular constituents to encode information. And there are multiple proofs of concept for the combination of nitrogenous bases A, C, T and G to make synthetic DNA. For example, in 2018, Robert Grass encoded Massive Attack’s album Mezzanine as part of an artistic project. The DNA strands thus created were stored in microscopic balls and then inserted into spray paint used for graffiti.

Illustration of the methodology of R. Grass (2018)

And in June 2019, scientists at a start-up announced they had encoded 16 Gigabytes of Wikipedia pages into DNA form.

In the above examples, molecular storage was in vitro. The next step is in vivo storage, i.e. the synthetic DNA strands are reintroduced into living cells. The amount of information contained in the cell's DNA would therefore increase - which should not be confused with an increase in the cell's genetic material. Genes are sequences that encode vital proteins and they represent only a very small part of DNA. In human DNA, genetic information makes up less than 2% – the remaining 98% is often considered useless (“junk” DNA), although recent findings suggest that some of it may play a key catalytic role in gene expression.

What is the real potential of molecular storage in vivo?

  • Density: DNA offers a density of up to 1018 bits per mm3, i.e. approximately 1 million times denser than the densest medium existing today. To illustrate this: there are 3 billion base pairs in every cell of every human being. In the plant world, the Loblolly pine has 23 billion base pairs in the nucleus of each of its cells.
  • Conservation: protected from light and humidity, DNA can remain intact for centuries or even millennia, compared to only a few decades for current storage technologies. This is how DNA from fossils dating back thousands of years has been sequenced. Another, more tangible example, is this parasitic organism producing giant flowers which has been storing DNA sections of its (former) hosts for millions of years
  • The energy cost of information at rest: virtually zero.
  • Speed of access: this is the real weak point of this technology because the DNA must be sequenced to access the information. For this reason, the only conceivable application of this technology is archiving.
  • In addition, DNA has a unique characteristic: the replication of information is a natural process requiring no resources. This is essential to ensure the integrity of the information by providing copies to correct potential encoding or decoding errors.

In 2013, Fister & Ljubic from the University of Maribor in Slovenia explored the possibility of storing information in plants’ DNA and in 2016 they proposed a first commercial application: tracking intellectual property rights for certain varieties of plants or seeds. Compared to any other medium for storing information, plants have the unique advantage of contributing positively to the environment if they grow naturally - meaning without artificial lighting and without irrigation. At a large scale, oxygen production could even exceed the carbon footprint of the synthesis and sequencing processes. And when it comes to destroying archived information, plants can be recycled, either burned to produce energy or used to make paper or, in the case of wood, used as a raw material. If archives forests were created, this could also contribute to the fight against deforestation and soil erosion in certain regions.

However today, the cost of synthesis – rather than that of sequencing – still represents too great an obstacle to make DNA storage a credible alternative to existing technologies. But in the medium term, if synthesis and sequencing technologies follow the same curve as information technologies, certain commercial applications can begin to be envisaged.

If this technology eventually emerges, we can imagine archive forests where companies can store their information for 10, 20, 30 years or more, and consult them on demand through the sequencing of a cell of a single leaf. In this hypothetical scenario, the archiving industry would not be the only one to experience a profound disruption : the forestry industry would also be deeply affected.

In a nearer future, and so long as costs do not decrease sufficiently to allow for large volumes to be processed and to address the corporate segment, this technology would primarily be relevant for individuals wishing to use an atypical medium to store information about them for very long term, even after their death, possibly to leave memories for posterity. And as we are talking about people’s willingness to leave a legacy, it is worth noting that some pay particular attention to the material of the coffin in which their dead bodies will be buried… So why not imagine that they could also choose a tree species to leave their digital identity ?


Sources :
  • Nature Reviews : Molecular Digital Data Storage Using DNA , Luis Ceze, Jeff Nivala and Karin Strauss, Volume 20, August 2019
  • Statista : information created globally, July 2020
  • Computerworld : Why Datacenters are the new frontier in the fight against climate change, August 2019
  • K. Ljubic, I. Fister : Storing data into a living plant. Technical Report, Maribor: FERI, 2013
  • Scientific American : DNA Data Storage Is Closer Than You Think, July 2019
  • Scientific American : What Is Junk DNA, And What Is It Worth, February 2007
  • Wired UK, March 2019 : With AI and DNA, Massive Attack are hacking a new kind of music
  • Quanta Magazine, April 2021: DNA of Giant "Corpse Flower" Parasite Surprises Biologists
  • Wikipedia
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more