Machine Reading for Literary Texts

For much of history, knowlege has existed only in an unstructured format, and it has remainded that way until recently. The huge number of texts that have been made available as a result of digitization opens up new possibilities for researchers to access and analyze information at a quantitative level - a process that was never possible before.

Within the Department of English at Simon Fraser University (SFU), Professor Margaret Linley has assembled a research team that is taking on the challenge of developing machine reading techniques for the literary analysis of texts. 

Her project studies the Lake District travel books, digitized by Linley with SFU Special Collections and hosted by SFU Library. The team will investigate different machine reading systems for the books themselves, as well as webpages that describe the literary, historical, geographical and cultural context of the travel books.  

Machine reading is the process of extracting structured information from unstructured data, and it can be used to analyze and reveal new insights on a large scale. This has the potential to transform research in the digital humanities through a similar approach called “distant reading”. This approach finds statistical patterns in collections of literary texts, tracks how ideas, genres, topics and even moods and emotions circulate, and extracts relationship networks of characters. 

Linley’s project is partially funded by the Next Big Question fund (NBQ) on behalf of SFU's Big Data Initative, which invests in knowledge mobilization projects that have the potential to transform the big data field. By enabling cross-disciplinary expertise in the capabilities of machine reading systems, Linley and her team are evolving natural language processing, machine learning, and big data analysis. 

"Our interdisciplinary research would not have been possible without NBQ funding, and the results of our explorations represent a major breakthrough in developing approaches to machine learning for digital humanists," says Linley.

This project will build machine reading expertise for SFU's Big Data Hub, and the effects of this project are being felt across the university. "Thanks to NQB funding, the work we have done places this project on the leading edge of the field of digital humanities research, while simultaneously building expertise in machine learning across disciplines at SFU and beyond."

Beyond the English disciplines, extracting knowledge from texts through machine reading can expand the frontiers of arts and science more broadly. With support from the NBQ Fund, Linley and her team were enabled to overcome disciplinary boundaries to create cross-disciplinary collaboration to harness the transformative technology of machine reading. Now a partnership between researchers in Computer Sciences, Big Data, Linguistics and English is expanding leadership in this field to enhance the big data cluster. Lastly, the project is preserving qualitative digital research for years to come by offering unique mentoring and training opportunities for future innovators and leaders in machine learning and digital humanities.