English professor Colette Colligan has helped develop a custom tool to mine media coverage, and create new capabilities for large-scale text matching and comparative analysis.


SFU researcher uses big data to mine new insights into media coverage of the Oscar Wilde trials

April 27, 2018

In the spring of 1895, renowned Irish playwright, poet and author Oscar Wilde was caught in the eye of a TMZ-style media storm.

Over the course of three extensively reported trials, Wilde’s personal life was turned into a spectacle over accusations of homosexuality, male prostitution and sexual blackmail. Media outlets across the globe relished every detail of the high-profile scandal, collectively printing more than a million words’ worth of press coverage.

English professor Colette Colligan wondered where exactly those million words originated on the map, and where they diverged. Many of the news reports were almost exactly the same, yet altered in small, but important ways. With Wilde’s sexuality on trial, the media coverage from the period offers unique insights into how ideas of sexuality were reported and censored in newspapers around the world. When collected, compared and mapped, what might they show us about news virality, national press standards and, ultimately, sexuality? The possibilities were fascinating.

But there was just one big problem. It would take her entire career to research, access and compare a body of text of that size and complexity.

A Wilde Idea

Facing a dreary eternity in library stacks and dusty archives, Colligan had a Wilde idea. If software like Turnitin, a popular plagiarism site used by educators, could scan student essays and look for matches in a large database, could she develop a custom tool to mine media coverage, and create new capabilities for large-scale text matching and comparative analysis?

Her quest led her beyond the walls of the English department to a collaboration with SFU Library software developer Michael Joyce and researchers Sarah Bull (now a Wellcome Trust Research Fellow at Cambridge University) and Cécile Loyen. Together, they conceptualized and built a custom database, or “textbase”, with a powerful algorithm that could process the kind of rich, cultural data the news reports contained. The Wilde Trials Web App was born.

New capabilities and new approaches to digital scholarship are opening up opportunities to engage with textual data on increasingly larger scales. To unlock knowledge about the cultural past and present, arts and humanities scholars are moving beyond traditional boundaries to explore data-intensive and computational methods for analyzing cultural materials. The Wilde Trials Web App shows how researchers across disciplines are advancing digital scholarship, digital making, and the study of digital culture.

“When you find those papers that have unique coverage, it's quite a discovery,” says Colligan. “It's different and independent. So that's revealing new information about the event as well as a new perspective.”

The Wilde Trials Web App enables both large-scale, multi-directional comparison of thousands of news reports, as well as close detailed analysis of individual reports. To populate the database, Colligan’s team drew from international newspaper digitization projects, as well as optical character recognition (OCR) transcription. The software has now examined more than 1,200 news reports covering the scandal. Not only does the software identify all of the commonalities and variations in the vast textbase, but also when and where in the world they occurred.

“When you find those papers that have unique coverage, it's quite a discovery. It's different and independent,” she explains. “So that's revealing new information about the event as well as a new perspective.”

The team’s work currently stands at the forefront of data-driven research methods on the 19th- century press. But beyond the trials, the application has the potential to save countless other scholars from bleary-eyed afternoons in the library stacks. Colligan sees the software as a compelling tool for other researchers and professionals, like journalists, who gather text and ask questions about plagiarism, text reuse, viral text and censorship.

New lab serves as big data research incubator  

In summer 2016, Colligan co-launched SFU’s Digital Humanities Innovation Lab (DHIL) with English professor Michelle Levy, Web and Data Services Developer Michael Joyce, and Digital Scholarship Librarian Rebecca Dowson. As part of KEY, SFU’s Big Data Initiative, and in partnership with the SFU Library, the lab is a research incubator and source of training and mentorship for scholars across SFU. The DHIL team, which includes graduate research assistants, works to support SFU faculty and graduate students with digital scholarship through consultation, training, mentoring, research software development and technical support.

“We see it as laying the groundwork for different types of research that use data-intensive approaches, and that work with cultural data in particular,” Colligan says. “We're working with researchers, faculty and graduate students to mentor and support them in the work they're doing, and some of these projects can be experimental and exploratory.”

While Colligan’s project with the Wilde Trials Web App is highly data-intensive and relies on computational methods, other projects at the DHIL are focused on what she calls “archive building.” These researchers are leveraging the online environment to preserve and share archival information in interactive ways.

“We have a variety of projects on the go, keeping with the diversity of work that digital humanities encompasses,” she says. “Let's get humanities and social science scholars working with us who have questions, who are using digital and computational methods or working with cultural data and thinking about ways to preserve it and share it.”

Outside the lab and in her classroom, Colligan inspires her English and digital humanities students to explore digital tools and methods as a means of creative and critical expression. From Twitter narratives to map-based storytelling, she integrates her research practice with her teaching, introducing the next generation to new ways of interacting with historic and modern bodies of work.

“Digital humanities is not just about coding and learning how to code,” she says. “It's about being critical about digital research methods and digital analytical methods and also thinking critically about digital culture and digital media.”

The possibilities are endless.

Simon Fraser University is empowering our stakeholders to unlock big data for research, education and community impact. Building on a decade of leadership, SFU is investing in advanced research computing and connecting our campus community to new tools and resources to accelerate scholarship and innovation across every field of study for real world impact.

Learn more about KEY, SFU’s Big Data Initiative.