Library of Congress Meets Twitter: Massive Archiving Ensues

TwitterWas getting one of your works archived in the Library of Congress on your bucket list of things to do before you die? Well congratulations! You can cross that one off and turn your attentions to tight rope walking, meeting Billy Idol, or being the next contestant on the Price is Right because, odds are, your work has been archived!

On January 4th, the U.S. Library of Congress released a statement saying that they have successfully gathered 170 billion ( and growing by the second) public tweets and have already begun the unimaginable task of cataloging, indexing and archiving each and every one of them. Apparently, the Library of Congress and Twitter reached an agreement back in 2010 that gives the government institution access to the 21 billion tweets that accumulated between 2006 and 2010 followed by the 150 billion more that have been posted since.

Sound like a total waste of taxpayer dollars? Maybe, but here is what the Library of Congress had to say about their new project:

“Twitter is a new kind of collection for the Library of Congress but an important one to its mission. As society turns to social media as its primary method of communication and creative expression, social media is supplementing, and in some cases supplanting, letters, journals, serial publications, and other sources routinely collected by research libraries.

Though the library has been building and stabilizing the archive and has not yet offered researchers access, we have nevertheless received approximately 400 inquiries from researchers all over the world. Some broad topics of interest expressed by researchers run from patterns in the rise of citizen journalism and elected officials’ communications to tracking vaccination rates and predicting stock market activity.”

Although there are critics who would bash the project as a waste to time, effort, and resources, the Library of Congress has a great point. Just as they have collected war letters and the journals of historical figures over the decades, because of the insight they provide into a different time, our tweets will one day serve the same purpose. Did you write a letter or a journal entry about how the recent presidential election made you feel? Probably not, but the Library of Congress is willing to bet that you (and your mom, cousin, BFF…) tweeted about it. What better way to preserve the insane amount of would-be historical information that is being posted all day, everyday?

Even though the Library of Congress themselves aren’t entirely sure about the way the archive will be utilized, they have released a PDF that outlines the entire project. The PDF also makes mention of the 170 billion tweets comprising, in total, 133 Terabytes of data. It also mentions that each tweet contains 50 metadata fields, providing much more information than just the simple text of the tweet. Unsure what metadata is? Check this out.

The statement made by the Library of Congress about the project and its progress can be read here, on their blog.

The recently released PDF file that outlines the project and includes updates as recent as this month can be accessed here.