Undetermined by Borders: The Commonality of Counting

Peter Baskerville, University of Alberta

Society operates in a world of data abundance rather than data scarcity. Such abundance is leading to new understandings of innovation and production processes. Increasingly innovation is generated by open collaborative interaction. Increasingly, too, mastering data abundance demands computational efficiency. Collaboration and computation challenge long held norms for the ‘doing’ of history. A paradigm that privileges the lone scholar reading and analysing textual manuscripts needs rethinking.

All Historians count, some more explicitly than others. Today, with the help of computers, it is much easier to count explicitly than at any other time in History. Some worry that much is lost in such a perspective. I argue counting should never be done in a vacuum and counting properly will ‘defragment’ the historical profession. Counting explicitly can lead to a way of reconceiving what we thought we had seen clearly before. I argue for the return of micro-data.

The phrase micro-data (for examples of historical micro data sets see http://www.rhd.uit.no/nhdc/micro.html) should elicit shivers of anticipation up and down Humanists’ spines: it evokes visions of the small, tiny, individual experience and highlights specific rather than general, potentially aberrant rather than common behaviours. Yet recently many humanists have shunned micro-data and privileged narratives that explore region, nation, and the grand synthesis. Over these issues history battles of a decade or more ago divided. I have no intention of returning to those, somewhat, tired debates. New issues, exciting ones, confront Historians today.

My brief for the return of micro-data rests on the contention that its mission must be thought anew. The use of micro-data allows us to put together a different vision of place, country, region, nation and to situate diversity within general explanatory frames. What is at issue is not a subversive desire to throw regional/national units of analysis on some dust heap: rather it is—among others—a desire to write rich analytic treatments that can help inform our knowledge of the past and improve the context within which we operate in the present and the future.

To achieve these results one requires copious and easily accessible micro-data of a non-trivial kind. That is precisely where we are today and will be increasingly tomorrow: swamped by easily accessible micro-data that cries out for informed use. Google has the rights to an unthinkably large portion of what has been printed in the past and present undetermined by borders. The internet provides researchers with billions of pages of data. What is a researcher to do: start at the beginning and read through to the end?–no chance. One has to be systematic in one’s approach. Light touches on context will not suffice. Examples are not useful unless they are unusual or characteristic and establishing either means counting and comparison.^[1] One needs to privilege representivity–but how?

Many historians have tread lightly around a set of tools that can be used to take advantage of this plethora of rich data. Some adherents of the New Cultural History have argued that there is a fundamental incompatibility between historical words and historical numbers. Yet both represent some version of reality in the past. Both need to be understood first in terms of their provenance. Neither offer complete answers to any set of questions. Moreover quantification in history is not a school or a sub discipline. It is a tool: and one that can today, thanks to such statistical packages as SPSS, be well enough mastered even by those who see themselves as mathematically challenged.

To take advantage of the multiplying gigabytes of micro-data, one will require a tool box up to the challenge.^[2] A multi-pronged response is emerging and is called Humanities Computing. Tools for large-scale data analysis allow researchers to discover relationships and perform computations on data sets that are so large that they can be processed only by using advancements recently developed and rendered cost efficient.^[3] Text mining using XML and TEI result in databases and databases lead to analysis of patterns which means quantitative statements or methods.^[4]

A recent workshop held at the University of Alberta brought together five teams of Humanists who explored the relevance of High Powered Computing for their projects.^[5] Four groups privileged textual, image or sound analysis and one privileged numbers. Some discussion focused on the compatibility of numbers and text; in the end, however, all were classifying, categorizing and comparing relationships between various aggregations of micro-data. We need new terminology to describe what Humanists/Historians analyze:^[6] images and sound do not capture the totality of information available; text is burdened with conceptual/attitudinal issues that limit an appreciation for what the future is bringing. Numbers seem to evoke a deep seated skepticism. A more all inclusive term might be Humanist/Historical data sources. It is a phrase increasingly in use: witness the first Digging into Data competition sponsored by four major granting agencies (including SSHRC) across three nations in 2009. The agencies were overwhelmed by the number of applications. In many ways it is a harbinger of the future: Humanities research will be focused on large data bases. Humanities research will utilize quantitative methods in order to make sense of that data. Humanities research will be increasingly international and interdisciplinary in collaboration and focus. (see the Digging into Data website for the announcement of a second funding round.Yet even Digging into Data differentiated between text and data analysis, as if text were not a form of data, or even more as if text could be analyzed separately from ‘data’, a term one can only assume was meant to refer to numbers. Two of the successful teams focused on speech; one on music; one on visual images; three on written sources; only one mentioned numeric sources and that was after privileging textual and geographic materials.^[7] The analysis of these data sources ought not to be done in isolation one from another. At root, classification, categorization and exploration of relationships is common to all.

Great accomplishments have been achieved in making micro-data sources accessible. In the Canadian context the most dramatic advance is the Canadian Century Research Infrastructure Project (CCRI).^[8] It has provided public use samples of the nominal level Canadian censuses for 1911 through 1951. On an international level this is increasingly commonplace.^[9] As we have outlined in an earlier Forum article, by privileging spatial, literary and quantitative data, the CCRI goes farther than any database in the direction of inclusivity.

In the context of defragging history this approach needs to be underlined. The CCRI points to an integrated approach for historical studies. It does not imply that the quantitative trumps the qualitative or the reverse. When quantitative data seem at odds with contemporary commentaries, such commentaries are hardly displaced. But they are viewed from a different perspective and provoke new questions about their meaning and social importance.^[10] The end in view is a comprehensive approach to a holistic understanding of the past and its meanings for the present.^[11]

[1] David L. Hoover, “Quantitative Analysis and Literary Studies,” in Susan Schreibman and Ray Siemens, eds., A Companion to Digital Literary Studies, (Oxford: Blackwell, 2008), Chapter 28, http://www.digitalhumanities.org/companionDLS/

[2] For a compendium of potential tools see http://portal.tapor.ca/portal/portal. For an historian’s use of data mining see Chris Drummond, Stan Matwin and Chad Gaffield, “Inferring and Revising Theories with Confidence: Analyzing Bilingualism in the 1901 Canadian Census,” Applied Artificial Intelligence 20, no.1 (2006), 1-33.

[3] John Bonnett, Geoffrey Rockwell and Kyle Kuchmey, “High Performance Computing in the Arts and Humanities”, https://www.sharcnet.ca/my/research/hhpc. Melissa M. Terras, “The Potential and Problems in using High Performance Computing in the Arts and Humanities: the Researching e-Science Analysis of Census Holdings (REACH) Project”, DHQ: Digital Humanities Quarterly, 3, no. 4 (2009).

[4] Stephen Ramsay, “Databases,” in Susan Schreibman, Ray Siemens and John Unsworth, eds., A Companion to Digital Humanities, (Oxford: Blackwell, 2004), chapter 15, http://digitalhumanities.org/companion/; Donald A. Spaeth, “Representing Text as Data: the Analysis of Historical Sources in XML,” Historical Methods, 37, no. 2 (2004): 73-85.

[5] http://ra.tapor.ualberta.ca/mindthegap/.

[6] Rhetorical relativism points to the influence that rhetorical devices from simple words, to images to narrative frames exert on the expression of ideas and meanings. Some of the literature on these issues is cited by Lyle Dick in this collection.

[7] http://www.marketwire.com/press-release/SSHRC-Leading-International-Research-Agencies-Announce-Winners-Prestigious-New-Digging-1085655.htm.

[8] Historical Methods 40, 2007 and Eric Sager and Peter Baskerville, “Canadian Historical Research and Pedagogy: A View from the Perspective of the Canadian Century Research Infrastructure,” forthcoming, Canadian Historical Review, 2010. The 1911 database is available at the following web site: http://nesstar2.library.ualberta.ca/webview/. Other Canadian initiatives in this area include: For 1852 and 1881 http://www.prdh.umontreal.ca/census/en/uguide/OLD/1881projects.html. For 1871, http://www.isr.yorku.ca/. For 1891 and a version of 1871 http://www.uoguelph.ca/~kinwood/. For 1901 http://web.uvic.ca/hrd/cfp/data/index.html. For information on data linkage across time and space http://www.recordlink.org/

[9] Patricia Kelly Hall, Robert McCaa and Gunnar Thorvaldsen, eds., Handbook of International Historical Microdata for Population Research, (Minnesota Population Centre, 2000). See also http://www.ipums.org/ and http://www.nappdata.org/napp/intro.shtml

[10] A.W. Carus and Sheilagh Ogilvie, “Turning Qualitative into Quantitative Evidence: A well-used method made explicit,” Economic History Review, 62, 4 (2009), 893-925.

[11] The issues of management and sharing of data sources also require rethinking. See the editorial “Data’s Shameful Neglect,” in Nature, 461, no. 7261, (Sept, 2009) for a perspective from science and ‘Canadian Digital Information Strategy,’ Library and Archives Canada, http://www.collectionscanada.gc.ca/cdis/index-e.html

Please note: ActiveHistory.ca encourages comment and constructive discussion of our articles. We reserve the right to delete comments submitted under aliases, or that contain spam, harassment, or attacks on an individual.Cancel reply