Archival Digitization and the Struggle to Create Useful Digital Reproductions

By Krista McCracken

Reverse of a photographic postcard, Benna Fuller Collection, Shingwauk Residential Schools Centre, 2011-8/001 (006)

Reverse of a photographic postcard, Benna Fuller Collection, Shingwauk Residential Schools Centre, 2011-8/001 (006)

The past decade has fundamentally changed how archives provide access to historical records.  Many archives now provide digital access to collections, have digitization on demand services, and have started to prioritize collections for digitization.  Much of this digitization has been driven by funding bodies and a desire to increase accessibility to collections. But how has the digitization of archival records been received by historians, genealogists, and other patrons?

One of the initial challenges presented by the digital representation of archival sources is the need to preserve context.  Original order and provenance are fundamental archival arrangement principals which help maintain context within collections. Original order allows for connections between records to be illuminated and provenance describes the origins of archival records. However, many archives have struggled to replicate the physical experience of browsing through an archival box in a digital environment. At times this challenge has resulted in the loss of context or inability to determine original order online.

Some researchers are weary of the digital record being a true representation of the physical record.   For example, an institution might make the decision to only scan the front of photographs for online consumption.  Any notations on the rear of the photograph are then inputted into a notes field.  The notation is still preserved but it’s not displayed in its original form.  The transcription process used to enter the notation as metadata may include interpretations of handwriting, short forms, etc.  The very process of transcription is subject to interpretation and human error. 

Including of a set of transcription standards on an archival website may help eliminate some of the worry about improper transcription or material being left out.  However many researchers prefer both a transcribed version and digital copy of a handwritten archival record which allows them to compare the transcription with the original.[1]

For cases of archival sources that are in machine readable text, using OCR software can eliminate some transcription errors and allow the documents to be full text searchable, which can be invaluable for researchers. That being said, OCR isn’t perfect and errors tend to be most evident when OCR is used on older texts with irregular or faded typefaces.  Correcting OCR errors can be extremely time consuming for archival staff and is something many organizations simply don’t have the time or staffing to devote to.[2]

Another common concern relating to the digitization of archival sources comes from the quality of the digital surrogates. If the digital copy is blurry or marked is this a product of a poor scan or a true representation of the record?

Touched up photograph of St. Mary's River, circa 1880, public domain.

Touched up photograph of St. Mary’s River, circa 1880, public domain.

Photograph of St. Mary's River, circa 1880, public domain.

Photograph of St. Mary’s River, circa 1880, public domain.

How can the researcher be sure that all marginalia and annotations are represented in the digital copy? The original photograph above includes a notation that the photograph was taken by William Dunlop in Sault Ste Marie.  But in the above representations of the photograph this note is missing. Including information about the scan resolution, condition of the original, notes, and any digital editing done can help alleviate these concerns.  Similarly, including a border around the scan allows researchers to be sure that the entire document or photograph has been scanned and hasn’t been cropped by someone during digitization.

 How can archivists and other heritage professionals work with researchers and historians to make the best use of digitization? Explaining how records are digitized, transcribed, and presented online helps mitigate many concerned regarding authenticity of reproductions.  In cases where only selections of a collection have been digitized, archives should be clear that only some of the material is available online and how the remaining material can be accessed.

Many digitization workflows include spot checking, reviews of content before it is published online and other checks and balances. That being said, human error happens.  The Art of Google Books project is an amusing example of small errors in mass digitization, namely images of scanner operators’ hands found in digitized books. Pages can be missed in scanning and metadata can be inputted incorrectly.

Archivists often work with expert researchers to correct processing errors and improve collection descriptions.  For example, many archives have identified photographs based on patron knowledge and corrected dates or attributions based on patron research.  Digital archival platforms, which allow easy communication between archival staff and researchers, helps facilitate this and can strengthen archival description.

Digitization of traditional archival material is something many archives and heritage institutions are working to integrate into their day to day practices.  Understanding what patrons want and how they use existing digitized material is crucial to creating digitization programs which are effective and practical.

Krista McCracken is a Researcher/Curator at Algoma Unviersity’s Shingwauk Residential Schools Centre.  She is a co-editor at Activehistory.ca


[1] Alexandra Chassanoff, “Historians and the Use of Primary Source Materials in the Digital Age,” The American Archivist, vol. 76, no. 2. (Fall/Winter 2013), pp. 458-480.

[2] For those interested in the use of OCR in archives: Larisa K. Miller’s “All Text Considered: A Perspective on Mass Digitizing and Archival Processing” is an good example of a new take on OCR use in archives. Miller’s article suggests a drastic shift in how machine readable archival records are digitized and made available, by eliminating archival processing and finding aids and shifting to mass digitization and OCR.

4 thoughts on “Archival Digitization and the Struggle to Create Useful Digital Reproductions

  1. It’s great to see folks outside the library and archives talking about this. A couple things:
    1. Most archives attempt to create a digital copy of the physical thing as is. Rarely do they touch up or photoshop the image (if it’s images we’re talking about). It’s possible that someone creating an exhibit or something might. Best practice is to leave it alone. Providing the user with a high resolution copy that they can inspect or download and inspect with their own software is another story. This is dependent on a number of factors — most notably copyright/privacy/ip and the policies of the host institution. Too often we find archives and libraries who feel it necessary to limit the use of the digital object rather than open it up for everyone to use. Regardless, your point about letting folks know how things are done is a good one. I think we’ll try to succinctly describe that process on Sask History Online as it evolves in the coming year or so.
    2. You’re right in saying that providing context is a challenge for archives and that respect des fonds is fundamental to the way archives work. There are a few things being done in digital libraries and archives to help combat that, one of which is the use of the RDF (Resource Description Framework) standard and another is simply curating digital quality exhibits. I would argue that there are also significant benefits to decontextualizing these collections. Collections pulled apart digitally can be reassembled in ways not envisioned by the archivist or even the original creator. The content can be mapped, remixed, grouped by subject, or pulled into a visualization of some sort. Powerful stuff.
    3. My final point is really a question/invitation. I’ve worked with several scholars, teaching faculty, researchers, or whatever label we like to use, who are more than interested in providing feedback, working with the collection, involving students in a number of ways. It is, for most of us, some of the most gratifying work we do. I wonder though, between all the work everyone does, just how to most effectively and efficiently work together? It would be great to connect researchers with more opportunities to help enrich descriptions, identify connections with other content, or simply promote its use.

    You’ll have to forgive the long-winded reply. You got me all excited about digitization and its role in active history.

    -Craig

  2. Thank you for your detailed comment, Craig. It provided a lot of food for thought.
    I agree with you that decontextualizing or re-framing collections can be beneficial. The reordering of collections and use of collections in visualizations can make material more accessible and often reveal things that the original order (or received order) might not.

    Working with researchers and community members to improve collections is rewarding and can be extremely valuable to archives. I agree that it would be great to see more organizations actively involving the public in collection enrichment. That being said, I think many community archives already do a good job of involving patrons in description and identification. And larger scale crowdsourcing projects (such as Transcribe Bentham) also provide examples of successful integration of user generated content. You’re right though — there are definitely ways that archives could be more efficient and effective when working with researchers, communities, and other organizations.

    Thanks again for your thoughtful comment.
    -Krista

  3. “One of the initial challenges presented by the digital representation of archival sources is the need to preserve context. Original order and provenance are fundamental…”

    Indeed! LAC has made a multitude of digitized photos publicly available on-line but since the only means of access is via search the order makes it difficult to find photos in sequence. However, I recently created a Windows Phone app to facilitate faster viewing of their First World War photo collection (http://tinyurl.com/n5f2d6k) and through that I was able to find and arrange quite a number of photos clearly taken in sequence which told a much deeper story than any single photo in a set could provide.

    For example, compare this clearly staged photo of German prisoners: http://tinyurl.com/nx2f7cp with this seemingly unstaged shot taken immediately after: http://tinyurl.com/oo7e8qd. A photo of Canadian soldiers serving cigarettes and coffee to German prisoners probably wouldn’t be published in 1916, but it says something subtle about the nature of trench warfare and the perception of ‘the enemy’ that few history books convey.

    There are dozens of other photo sequences that show similarly enlightening tales. In fact, I’m now disinclined to read too much into any single photo since it’s difficult to derive context without no before or after shot as well, especially considering that a great many photos from this period were clearly posed/staged.

Leave a Reply