The British Library Plans to Archive the UK Web

Around this time back in 2010, the Library of Congress (LOC) announced on its website that it wants to archive all of Twitter’s firehoseĀ since March 2006. It was a bold decision, and since then, the LOC hasĀ amassed over 170 billion tweets. The LOC said,

As society turns to social media as a primary method of communication and creative expression, social media is supplementing, and in some cases supplanting, letters, journals, serial publications and other sources routinely collected by research libraries.


Now, the British Library has taken a similar decision and wants to archive the entire UK web. However, this effort includes websites, blogs, forums and social media sites. They have estimated a total of 4.8 million websites, and over 1 billion webpages going into the archive. The entire process will span over five months, including three month of data collection and two months of processing. This processed data will act as a vital resource for researchers from the future generation.

A similar project was undertakenĀ in Iceland where all websites since 2004 are being archived. This is commendable work. The fact that digital content is being given a place in history by archaic organizations like these simply shows how much they are willing to evolve and get past their traditional model. However, what worries me is that the effort they are making is quite redundant. The Internet Archive has been doing exactly this for over a decade, and it would make much more sense if all these individual libraries simply collaborate with the Internet Archive and fund it properly to do their work.


Published by

Chinmoy Kanjilal

Chinmoy Kanjilal is a FOSS enthusiast and evangelist. He is passionate about Android. Security exploits turn him on and he loves to tinker with computer networks. You can connect with him on Twitter @ckandroid.