This week in the Defining Cognitive Science Speaker Series, Maite Taboada spoke about her work extracting sentiment (opinion) from a discourse. For example, online movie or restaurant reviews can be categorized as either positive (“the movie is great”) or negative (“I don’t recommend this movie to anyone”). Maite offered three approaches, in increasing order of accuracy.
The first, most popular, approach analyses keywords and adjectives. In this method, the semantic orientation of certain adjectives is contained in a lexicon. Words and their distribution in the text are matched to the lexicon. Additionally, down-players and intensifiers such as “the least bit”, “very”, or “extremely” are recognized, then a sentiment score is calculated based on how many positive or negative occurrences of these there are. This method achieves approximately 62-68% accuracy.
The second approach looks at topic sentences. The system is trained to read up to eight different types of reviews, using a decision-tree algorithm. Only topic sentences are analysed, however identifying which ones are topic sentences is the biggest obstacle to success. This method improves a little over the keyword analysis, achieving approximately 69-73% accuracy.
The third approach is based on the discourse structure. This approach ignores elaborations, and operates partly based on the pattern of opinions occurring nearer the end of sentences. It looks at the rhetorical structure of the text, relying heavily on concessive relations to build blocks of coherent ideas at the sentence level. These rhetorical relations are divided into categories such as results, concessions, causes, and elaborations. Accuracy reaches around 70% at this level, but depends on a precise parser.
This article leaves out interesting examples and research from the original presentation. Please visit Maite’s website if you would like to read more about her research. Or click here if you would like to view Maite’s presentation slides yourself.