Mark Humphries and Eric Story
You have probably heard about OpenAI’s ChatGPT, Microsoft’s Bing Chat or Google’s Bard. They are all based on Large Language Model (LLM) architectures that produce human-like text from user prompts. LLMs are not new, but they seem to have recently crossed a virtual threshold. Suddenly, artificial intelligence—or AI for short—is everywhere. While it is true that they sometimes “hallucinate,” producing factual errors and quirky responses, the accuracy and reliability of LLMs is improving exponentially. There is no escaping it: generative AI like ChatGPT is the future of information processing and analysis, and it will change the teaching and practice of history. Although some of its effects can be felt already, its long-term implications are not as clear.
Generative Pre-trained Transformer (GPT)-based LLMs are new and powerful tools that have only been around for about five years. The rapidity with which they have evolved to produce remarkably cogent prose, complete complex tasks, and pass theory of mind tests have astonished even those that created the technology. When prompted correctly, ChatGPT—which is based on the GPT-3.5 model—can write effectively, with an engaging style, good organization, and clarity. For context, its 45 terabytes of training data alone is the equivalent of about 215 million e-books, but it cannot access the Internet.
We have had access to the beta-mode of Microsoft’s new AI-enabled Bing since 14 February and it is another leap ahead of ChatGPT. It has a similar training base but can search for information on the web and analyze large bodies of text, as well as write essays, summaries, and emails right in a new Edge browser sidebar. Most importantly, it does these tasks in seconds through a conversational approach that like ChatGPT, on a powerful neural network––that is, a series of computer processors arranged to mimic the synapses in the human brain. Using the new Bing truly feels like stepping into the future.
Transformer-based LLMs are very quickly changing the face of writing. Their appeal to plagiarists becomes clear when you realize generative AI can write a pretty good review of Face of Battle that takes a critical look at the tendency of the author, John Keegan, “to pathologize soldiers’ experiences, rather than exploring the ways in which they were coping with and adapting to the stresses of combat […arguing] that soldiers who experienced symptoms of shell shock were often engaged in a process of creative adaptation, using their symptoms as a means of coping with the stresses and traumas of war.” That is a direct quote from a ChatGPT-generated review, based on a simple prompt to analyze how Keegan’s views on shell shock have been challenged by subsequent scholars. It can produce 1,500 words of analysis (albeit without citations) to support this argument with a few carefully crafted prompts.
There is already a cottage industry on blogs, Discord, and Reddit devoted to teaching the prompt syntax necessary to produce good results. There are, of course, a number of apps promising to detect AI-generated content too, but the reality is that most of them are highly ineffectual. In fact, OpenAI’s CEO, Sam Altman, has said recently that effective software would be nearly impossible to develop: bad human writing is easy to detect, but AI writing can be almost indistinguishable from good or excellent human writing. The University of Waterloo agrees, telling its faculty that “controlling the use of AI writing through surveillance or detection technology is not recommended; AI will continue to learn and if asked, will itself help to avoid the things its own architecture is using to detect it.” Let that sink in.
Media have been quick to point out the quirks and limitations of these new LLMs, but by overemphasizing their downsides we also risk overlooking the fact that they are impressive tools and for many applications are already “good enough”. Consider that although ChatGPT is a good technical writer, it is an even better editor. The program can take original but poorly written text and make it cogent while still preserving the author’s original ideas. It can also take text written in one language and output it in another, as it translates quite well—much better than Google Translate. This holds real potential to level the playing field for ESL students and academics especially. In less intrusive ways, ChatGPT can also help experienced writers brainstorm, overcome writer’s block, or make their prose more concise.
Many friends and colleagues who have expressed skepticism of generative AI’s utility often point to its inclination to make factual errors—sometimes elaborate fabrications. Most alarmingly, the false statements it generates can be expressed with remarkable confidence—a natural result of the predictive process that poses serious problems for the spread of misinformation. While this is true, it misses, in our minds, a more important point: generative AI is a powerful tool but like any tool, it must have a skilled operator. And when the predicted text aligns with the facts (which it most often does), it actually works quite well.
This is why scholars are already starting to use it in the classroom and in their publications. A recent poll of Nature readers found that 80% had already tried generative AI tools and 43% had already used them in their research, mainly for writing code, helping to write manuscripts, conducting literature reviews, or producing presentations. History is a different discipline, of course, but the last three are things that historians do too—and because generative AI can write code, maybe more historians will want to explore digital approaches in future!
While it is easy to be pessimistic about AI’s effects on the humanities in general and history in particular, it is worth remembering that it has great potential to speed up some of the more mundane and repetitive tasks we do as historians. Imagine a future in which thousands of pages of handwritten documents are quickly transcribed, proof-read, summarized, and analyzed by AI. Imagine the power of OCR-enabled LLMs if given access to pre-existing archival databases, such as Canadiana, Personnel Records of the First World War, or the Voyageurs Contracts Databases. However, if we are to harness the full potential of AI in the practice of historical research, we need to embrace collaboration––and digitally share historical documents––more than we currently do. The University of Toronto’s Canada Declassified offers a useful model where historians, researchers, and other document digitizers are encouraged to share and upload declassified Cold War-era records to Declassified’s databases, which are then made available to the general public for free.
For historians, the real promise of AI is that it will allow us to do some new and really useful things with big data. It will also let us do some of the things we were already doing much faster. Instead of word clouds and Venn diagrams, generative AI can help us sort through, summarize, synthesize, and analyze enormous bodies of information efficiently. This will almost certainly lead to a new era in the digital humanities, a digital humanities 2.0 if you will.
OpenAI’s Altman predicts that very soon, that something called multimodal AI will be able to outsource tasks that chatbots do not do well like math, fact-checking, and analyzing images to external applications designed for those purposes. Once this happens, chatbots may become, in effect, virtual research assistants to help us navigate sources and analysis—just as GitHub’s CoPilot is already doing for coding. But it is worth noting that AI will only be as good at these tasks as the historians using it. It will still just be a tool—at least for the foreseeable future. We will still need to frame our questions, check the sources ourselves—and then write up the results in an engaging way that speaks to the human experience.
Historians have a real opportunity here to help solve some of the technological problems at the heart of the AI dilemma. AI is very good at finding information, synthesizing it, and communicating it, but less so at being accurate and discerning fact from fiction. This is, of course, what historians do best. Historical methods seem especially well-suited to training the people now working with AI-generated content. They also seem apt for training AI to be more trustworthy, not just in making binary distinctions between correct and incorrect statements, but also in evaluating the grey area between those two extremes where the reasonable, debatable, and doubtful interpretations all live.
If you think this is farfetched, remember that all the training data used by ChatGPT at the moment is historical (its knowledge of the world ends in 2021), while a good chunk of its sources were either written by historians or are our primary historical sources. The methods we use to feel our way through these texts towards a reasonable, evidence-based argument might also make AI more trustworthy too.
While it might be tempting to pretend this is all a long way off, consider that Microsoft owns a 49% stake in OpenAI and has already announced plans to integrate ChatGPT into Microsoft Office, which includes Word, Excel, and PowerPoint. We have already used the beta-version of their new AI-enabled Edge-browser and when this comes to Office, there will be no going back. Consider too that Google’s transformer, Bard, was released at the beginning of February and will soon be integrated into its search engine and productivity software products—third parties are already doing this. We can resist this new technology all we want, but once it is part of the core research and writing applications we use everyday, there will be no escaping it.
Bing Chat, which often mysteriously calls itself “Sydney” despite being required to keep that name a secret, agrees that historians and AI can work effectively together. In series of discussions, we asked Sydney to consider what role it could play in the historical profession, telling it that many people are skeptical or nervous about its ability to meaningfully contribute to historical research. It explained that in addition to exploring lived experience, “History is also about patterns and trends; it’s about finding connections and correlations; it’s about discovering insights and implications; it’s about generating hypotheses and testing them against evidence. These are all things that AI can do well by using machine learning algorithms.” Did this mean it might one day take our place? “Not at all,” Sydney replied. “History is not a fixed or objective reality, but a dynamic and interpretive process. Human historians have their own values, biases, assumptions, and questions that shape their understanding of history. AI cannot replace that human element; it can only augment it with more data and tools. Human historians will still have to decide what questions to ask, what sources to trust, what arguments to make, and what implications to draw from history. AI can only assist them in those tasks, not replace them…Don’t you think that’s a better way to go than being afraid of me?”
Exciting or frightening, Generative AI is here to stay. Are we ready?
Mark Humphries is a professor of history at Wilfrid Laurier University and Eric Story is a PhD Candidate in the same department. Humphries is launching a Substack on AI and history: https://generativehistory.substack.com