The Perils of Digital Humanities for Academics

Dominique Clément

Why does historical training at universities place so little emphasis on research methods? The rise of digital humanities presents a fundamental challenge to how we train historians. But for anyone pondering a career in academia, it’s a perilous journey where the risks might not be worth the rewards.

We are in the digital age yet historical research remains primarily a modified pencil and paper discipline – laptops instead of paper, cameras instead of pencils. I’m an historian who happens to teach in sociology. Research methods are central to the undergraduate and graduate curriculum in sociology. In contrast, historical training remains the equivalent to throwing your kid into the deep-end of the pool – head off to the archives and figure it out on your own.

The lack of training in digital research methodologies is a profound failing of our discipline. There are few Canadian conference sessions, workshops, publications, or networks where historians can dialogue about their experiments with new technology. In a special edition of the Canadian Historical Review in 2020 on the use of digital tools for historical research, Ian Milligan discusses his survey of historians’ use of digital cameras for archival research: 92 per cent used a digital camera but 90 per cent had no formal training. Over 40 per cent took over 2000 images for their last project. But 70 per cent simply used their own device rather than professional equipment.

Technology is changing, but not our training. Most historians have to teach themselves how to use digital tools. I’ve been designing websites such as HistoryOfRights.ca since 1999, but in every instance I had to figure it out on my own. Remember Microsoft FrontPage? Or Adobe Dreamweaver? A number of Canadian universities offer courses on digital humanities for historians, such as Sean Kheraj’s digital history course at York University or UQAM’s new Histoire et humanités numériques program. Still, it is hardly a pervasive feature of the discipline.

My full entry into the digital humanities began in 2014. I collaborated with scholars at five universities across the country to create a historical database of government grants to nonprofit organizations in Canada. None of us had any experience with databases. I spent four months in the spring/summer of 2014 using a training manual for FileMaker software. I literally sat in front of my computer with the manual in front of me for eight hours a day trying to parse technical jargon with a history degree as my only training.

I created the FileMaker database to facilitate our team’s collaboration. It was hosted on a server, which enabled team members to upload and share their data simultaneously while using the software’s tools to analyze the data. Over the next three years I spent hundreds of hours tweaking the database, creating new scripts and calculations, adapting the interface to our evolving needs, and trying to fix the innumerable glitches that arise when creating a new digital tool. It was exhausting work. There was a fair bit of cursing at my computer screen. My partner got so sick of all the bellyaching about the database at home that, after two years, she insisted that I stop saying the word ‘database’ and refer to the thing as ‘Ernie’.

We also had to develop a method for collecting historical data on grants to nonprofit organizations. Our preliminary database was to be a listing of grants from 1960 to 2014 for three jurisdictions – Federal, British Columbia, Nova Scotia – focussing on four issue-areas: environment, human rights, Indigenous peoples, and women. Federal and provincial grants are listed in a publication titled Public Accounts. We scanned over 50,000 pages of the publication. We then processed the files using optical-text recognition software (Abbyy Fine Reader Corporate Edition, abbyy.com) and converted them into spreadsheets. The spreadsheets were then processed using WinPure (winpure.com) and Google OpenRefine (openrefine.org).

It was an exhausting and immensely stressful process. We’d never heard of Abbyy Fine Reader or OpenRefine when we began the project. None of us had much experience with large-scale digital projects. We began using digital cameras by hand until we discovered that the library could assist our team using proper book scanners. And even the best OCR software doesn’t produce perfectly clean spreadsheets. We spent months experimenting with different strategies to transform paragraph-style lists of grants into spreadsheets (Python is great, but it requires a modest talent for programing and has some limitations, especially when working with historical documents).

The source material: Public Accounts of Canada (1972).

Because reporting practices change over time and among levels of government and jurisdictions, the dataset required extensive cleaning to ensure accuracy and to facilitate the analysis. The names of organizations appear in different formats and styles; OCR software often produces typos or symbols rather than numbers or letters; and there are usually trailing spaces, random characters, and a mix of upper/lower case. It required hundreds of hours from over a dozen people over several years to simply clean the data.

And people make honest mistakes. One research assistant accidentally duplicated over 10,000 records in the database. Yet we could not figure out at first how or which records were duplicated (an impressive accomplishment, really). Another forgot to input the province and city for grant recipients for thousands of federal grant records.

We also faced dozens of challenges in creating a coherent dataset: the names of government ministries change all the time; reporting practices differ among jurisdictions and time periods; only grants above a threshold amount are reported in Public Accounts, which also differ over time and among jurisdictions; there are lists of transfer payments for the provinces dating to 1960 but only from 1972 for the federal government; the federal government did not report transfer payments for 1993 and 1994; and so much more. It didn’t help that there is no comprehensive collection of Public Accounts anywhere in Canada. It took years to find and digitize a complete collection.

We created an open data research portal with a database of 161,044 grants to 11,470 organizations totalling over $129 billion. There is also a digital archive of Public Accounts for most provinces and the federal government. In addition, we were able to refine our data collection and coding practices over time to facilitate the creation of new datasets, including over a million records of additional federal, provincial, and municipal grants data that will eventually be integrated into the database.

The online portal was launched in 2020 (statefunding.ca). For the first time in my career, I’ll be reporting a digital project as my main deliverable for my annual report at the University of Alberta. I have no doubt that the database will be recognized as a legitimate contribution to scholarship. At the same time, however, universities are struggling with how to assess the value of digital outputs. The AHA’s Guidelines for the Professional Evaluation of Digital Scholarship is helpful but was not designed as a practical tool for comparing digital with traditional scholarship.

I spent several years on my university faculty’s annual evaluation, tenure, and promotion committee. We never really did find a consensus on how to evaluate this type of research. The State Funding for Social Movements project has been the focus of half my professional career so far. It required thousands of hours of work over five years from myself alone not to mention the rest of the team. Yet I doubt it will get the same recognition as an article, much less a monograph. The monograph and peer-reviewed journal article reign supreme.

Ideally, in the near future, we will see history departments provide more formal training in digital research tools. And universities will provide the infrastructure for accessing this technology. I was an Assistant Professor when I started this project in 2014. I’m not sure I would do it again or advise a junior colleague to contemplate a large-scale digital research project. Given our lack of formal training, it’s an immense time commitment and a huge risk when compared to the guarantee of producing traditional scholarship.

Dominique Clément is a Professor in the Department of Sociology at the University of Alberta and a member of the Royal Society of Canada (CNSAS). He is the author of Canada’s Rights Revolution, Equality Deferred, Human Rights in Canadaand Debating Rights Inflation. Clément has been a Visiting Scholar in Australia, Belgium, China, Ireland and the United Kingdom. His websites, HistoryOfRights.ca and statefunding.ca, serve as research and teaching portals on the study of human rights and social movements.


Resources

AHA Guidelines for the Professional Evaluation of Digital Scholarship

Canada’s Human Rights History

National Council of Public History – Evaluation of Public History Scholarship

State Funding for Social Movements

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License. Blog posts published before October  28, 2018 are licensed with a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License.

3 thoughts on “The Perils of Digital Humanities for Academics

  1. José Igartua

    A familiar story. It’s always (since the 1970s and earlier) been the case that using computing methods for historical research has been much more labour-intensive than the traditional perusing of manuscript or printed sources.

    I took a FORTRAN course as an elective in my Ph.D. course work in 1967. It made me realize that computing would not be helpful for my Ph.D. research.

    Later, I was fortunate to learn computing at UWO’s Faculty of Social Science computing lab (FORTRAN, Basic, SPSS) and then to teach computing for historians at UQAC and UQAM. Great help from UQAM’s computing centre research support staff in setting up ORACLE databases for my book on Arvida.

    Neither Western’s Social Science Computing Lab nor UQAM’s computing centre research support exist anymore, so these training avenues have disappeared. But there is a lot of online tutorials now that weren’t available forty years ago.

    The investment in labour-intensive digitization and “cleaning” of historical documents should be assessed against the importance of the historiographical contribution one expects to make from it. Teamwork is an added benefit as well as an added cost. To my mind, the scholarly production of Canada’s largest such collective efforts (PRDH, the Saguenay Project, the Vancouver Island Project, the Univesity of Ottawa’s Canadian Century Research Infrastructure Project, etc.) amply justifies that heavy investments they required.

  2. James Keeline

    When I read the headline and opening paragraph, I thought this was going to be about the state of teaching research techniques, evaluating sources, and other critical thinking skills as they apply to digital humanities.

    However, it seems that the topic soon shifted to pioneering efforts to digitize records. This is extremely important and as a web professional for 25+ years who started using LAMP (Linux, Apache, MySQL, PHP) database sites in 2000, I fully respect what you accomplished.

    I am involved in a number of projects which involve scanning materials that relate to my avocational field of juvenile series books (think Nancy Drew, Tom Swift and hundreds of similar books). These are done completely from my own pocket and spare time, such as I have when I am not working full time as a Linux system administrator. I’ve specialized in this field for about 33 years now and my Stratemeyer.org site is just a tiny portion of my work in it.

    I may never have occasion to use your works but I appreciate you and all who put materials out there in a findable form.

Please note: ActiveHistory.ca encourages comment and constructive discussion of our articles. We reserve the right to delete comments submitted under aliases, or that contain spam, harassment, or attacks on an individual.