As part of a project on extracting sentiment from text, we have collected a corpus of movie, book, and consumer product reviews. For more information on the corpus collection, and the project it is part of, see the Project Description for "Computational analysis of text sentiment" and the following publications:
The reviews were downloaded in 2004 from the Epinions web site by Jack Grieve. They are divided in the following categories, with 25 positive and 25 negative reviews in each category. The classification into positive and negative was based on the "recommended" or "not recommended" tag that the reviewer provided.
The entire corpus in .txt format, structured in directories by product and with file names indicating positive and negative.
The corpus is annotated with RST relations at the sentence level (i.e., no full-text analysis; only those relations found within sentences). Texts were annotated with RST Tool, and the tool is necessary to view the annotaions. Annotations by Maite Taboada and Montana Hay.
Annotations of subjectivity types using the Appraisal framework. The very
first step in the annotation involved the creation of
system networks to use in the coding process. The
following are system networks created using UAM Corpus Tool, and
based on the explanations in J. Martin and P. White
(2005) The Language of Evaluation:
Appraisal in English. London: Palgrave
McMillan. The system networks and annotations were
created by Maite Taboada and Patrick Larrivee-Woods.
Only a subsection of the corpus has been annotated using Appraisal: movies, books and hotels (150 texts).
Annotation of negation and speculation, loosely based on the BioScope corpus annotation. For more detail on the annotation, please see:
Konstantinova, N., S. de Sousa, N. P. Cruz, M. J. Maña, M. Taboada and R. Mitkov (2012) A review corpus annotated for negation, speculation and their scope. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC). Istanbul, Turkey. May 2012.
This corpus is a collection of 400 reviews on cars, hotels, washing machines, books, cell phones, music, computers, and movies. Each category contains 50 positive and 50 negative reviews, defined as positive or negative based on the number of stars given by the reviewer (1-2=negative; 4-5=positive; 3-star review are not included). The reviews were collected from the website Ciao.es. They are intended to be a Spanish parallel to the SFU Review Corpus (in English).
Any comments or suggestions on the corpora and the annotations are more than welcome. Please let me know if you use them in your research: Maite Taboada (firstname.lastname@example.org).
©2004-2017 Maite Taboada, Julian Brooke, Jack Grieve,
Montana Hay, Patrick Larrivee-Woods
©2017 Maite Taboada, Natalia Konstantinova, Sheila de Sousa