Computer taught to write stories

December 01, 2005, vol. 34, no. 7
By Barry Shell

Document Tools

Print This Article

E-mail This Page

Font Size
S      M      L      XL

Related Stories

Anoop Sarkar finds it amusing that he and his graduate students are developing software that could write this article some day.

The project, called SQuASH (SFU question answering summary handler), scans multiple documents like newspapers or academic abstracts and then, based on a set of questions provided by the user, creates a short readable summary.

According to Sarkar, journalists should start worrying. “Our natural language processing ideas are modelled after the way babies learn language.

“They learn rapidly from input they observe because they combine important prior knowledge with novel experiences,” says Sarkar, an assistant professor of computing science.

As a teenager, Sarkar was fascinated by computer programming languages. “But computer languages are artificial languages. They are designed to be simple and easy to process so that each program has exactly one meaning,” he says.

In contrast, natural human languages are complex and full of ambiguity. At Pune University in India Sarkar wrote a compiler which translated computer programs into low level instructions for computers.

“I felt inspired to try to find similar tools for natural languages,” he says. That led him to graduate school at the University of Pennsylvania where he studied under Aravind Joshi, a pioneer in the field.

Consider the phrase from Sarkar's remark, “natural language processing ideas”. It has five possible structures with different meanings.

Adding one more word (e.g. natural language processing research ideas) yields 14 possible meanings. Nine words combine in 1,430 ways. How do you pick the right one?

Sarkar's research helps computers learn how to choose. “We expose our learning software to hundreds of thousands of cases for which the most plausible meaning is provided by humans,” says Sarkar.

But this is not enough. For effective machine learning the software is shown additional examples, analogous to the way a baby observes language without supervision.
The computer has statistical prior knowledge to make educated guesses that a given meaning is probably the most plausible one.

This core set of statistical “natural language processing ideas” can be applied to translation, mining information from text, and summarization.

So, if a future computer could write an article like this one, would you still read it? Or will you just ask your computer to read it for you and give you a summary?

Search SFU News Online