the future of long term data storage and access

June 4, 2009

Iron Crystals

Iron crystals in carbon nanotubes can represent binary data. (image credit: Zettl Research Group/LBNL/UC Berkeley)


Cuneiform from ca 3500 B.C. (thanks wikipedia)

One of the funny paradoxes of storing data is that the more “advanced” our way or recording information, the more fragile it is. Thankfully the ancient Mesopotamians didn’t store their agricultural transactions on DVDs, otherwise they’d have been lost in as soon as ten or fifteen years. A recent study by Alex Zettl from UC Berkley, reported in the Science Magazine News, describes a new, long lasting data storage technique involving nano-scale iron crystals moving back and forth in carbon nanotubes. The idea is, obviously, that storing data in these will keep it intact for a very, very long time.

Of course, if we begin archiving all of our contemporary data–magazines, newspapers, blog posts, balance sheets, emails–we’ll have so much data in 100 years, it will be hard to make sense of it all. The great thing about manuscripts found from the Dark Ages, or cuneiform from even far earlier, is that the specimens were so few and far between that hordes of Ph.D students and professors were able to extrapolate big ideas from very little bits of data–convenient in many ways. The problem in a hundred years is that we’ll have so much data, no historian will be able to extrapolate anything.

In all seriousness, though, data mining–the process of sifting through tons of data to find the relevant bits–is and will become even more of a vitally important aspect to historical research. Google pioneered a novel new way of searching out relevant bits of the web through how “interconnected” they are, but will this be possible once all of that data is put into archives?

Clearly, we’ll need some massively more powerful search engines than we have today to handle the trillions of terabytes of data we’ll have archived. In addition, clever computer science historians will have to develop rigorous ways of deciding what data is “relevant” and what is not to a user’s search query. An interesting shift will occur in historical analysis: Primary sources will no longer be sparse but rather much harder to separate into useful and non-useful for a particular search. In a sense, we will abdicate a good bit of historical decision-making (what’s important, what’s not) to some computer algorithms.

If this abdication of power from humans to machines makes you uncomfortable, get used to it. It’s going to be happening a lot in the years to come.


