The Internet Archive Rocks, or, Two Million Plus Free Sources to Explore

Checking out a 1857 book from the Internet Archive, no big deal.

By Ian Milligan

For many students, it’s back to school season. For me, that means it is time to think about some of the resources and tools that are out there. If you want to research a topic, it’s worth keeping in mind some great repositories online. The big one online is the Internet Archive – which is not just old websites.

I’ve written about the Internet Archive before, and it’s actually the main source base for my current major research project. But today I want to give a brief sense of what else you can find there in terms of digitized primary sources, amongst this massive newfangled Library of Alexandria that should be so central to many of our workflows. If you’re a historian, or are interested in history, I guarantee you’ll find something useful in the Internet Archive. Heck, if you use Mozilla Firefox, install a search plug-in right now for it. We’ll be here when you get back.

The inspiration for this post is the accomplishment of yet another major milestone: two million books, all freely downloadable, generated by a large network of some 33 scanning centres around the world. And that’s just books – there are additionally millions of texts.

For Canadian historians, your first starting point is going to be the “Canadian Libraries” portal, with some 421,143 items (I’m sure that number will be outdated by the time this goes live). Just scrolling down the page is an adventure in Canadian history: the 69,000 titles of the Canadian Institute for Historical Microreproductions (hosted by the U of Alberta), War of 1812 documents from Brock University, recipes and cookbooks, soldiers’ diaries, etc.

It’s amazing. The physical space of the library is noted, as you navigate collections by the Toronto Public Library, or a university, or a smaller community centre. But it’s all available here. In this we see the dream of utopian provisioners of access to historical information: sitting at home, without institutional subscriptions, being able to access arrays of primary sources. You can also do some cool stuff with this material that you couldn’t do with old paper copies:

Every Archive entry has this, which gives you the various options to read: from online, to PDF, to plain text, to e-pubs.

Download plain text files: Okay, elephant in the room – they’re often of pretty varying quality. You can imagine how difficult it is for a computer to read old dusty documents that even we have trouble, with our sophisticated brains. But, that means you’ve got something you can just Ctrl/Cmd + F to search, to get a rough idea of what might be inside them. To do so, click on the ‘full text’ as you see at right.

Create your own library: PDFs or epubs are great for this, and yes, you can download those. Who needs those old file cabinets or file folders full of documents? You’ve got a digital library that you can explore, play with, or even on one of those skeumorphic Apple apps.

Download it all, automatically, and quickly: This takes a bit of extra work, but if you know how to use wget (which you can learn from the Programming Historian 2) you can follow these instructions and download an entire collection onto your computer. Instead of having to fly off to an archive yourself, the archive can virtually fly to you.

Don’t like to read online? PDFs and EPUBs mean you can read on your phone, tablet, or are printer ready: Enough said.

And that’s not all. The Internet Archive also has a pretty amazing TVNews archive, so if you’re interested in some aspect of how history is remembered or want to see some public history in action, a quick search brings you to short clips of newscasts from around the world. I‘ve mined it on my own blog for fun. There are also videos and audio.

So my suggestion is: it’s September. New things have been added. Go check it out – if you haven’t been there in a while, spend 10 or 15 minutes poking around. I almost guarantee you that you’ll find something useful for your project.

Ian Milligan is a Canadian and digital historian at the University of Waterloo. Today also happens to be the first day of class, as well as the launch of his collaboratively co-written book on Big Data and history, so he’s pretty stoked that he managed to get anything up on the blog today. 🙂

2 thoughts on “The Internet Archive Rocks, or, Two Million Plus Free Sources to Explore”

Sean Kheraj September 9, 2013 at 10:09 am

Great post today, Ian. I just want to echo the sentiment that this is the most important resource for historical research online. I introduce this resource to my students every year and many of them make use of it for their primary source research essays.
Daniel Joseph Samson (@ruralcolonialNS) September 11, 2013 at 7:37 am

Agreed, the potential is great – two years ago I published a piece researched entirely in my summer-house kitchen in PEI – but for many of us the uses remain better for teaching than for most research. Those of us who work in the 18th/19th centuries, largely in manuscripts, see that utopian dream as being just that. I speak too as someone who still has to go to the PRO occasionally to read the missing Colonial Office pages/files from microfilms made in the 1960s.