The SFU Review Corpus
Maite Taboada
Simon Fraser University
mtaboada@sfu.ca


As part of a project on extracting sentiment from text, we have collected a corpus of movie, books, and consumer product reviews. For more information on the corpus collection, and the project it is part of, see the Project Description for "Computational analysis of text sentiment" (http://www.sfu.ca/~mtaboada/research/nserc-project.html) and the following publications:

    * Taboada, M., C. Anthony and K. Voll (2006) Methods for Creating Semantic Orientation Dictionaries. Proceedings of 5th International Conference on Language Resources and Evaluation (LREC). Genoa, Italy. May 2006. pp. 427-432. Paper in pdf format.
    * Taboada, M. and J. Grieve (2004) Analyzing Appraisal Automatically. American Association for Artificial Intelligence Spring Symposium on Exploring Attitude and Affect in Text. Stanford. March 2004. AAAI Technical Report SS-04-07. (pp.158-161). Download paper in pdf format. - Download poster (pdf).

The reviews were downloaded in 2004 from the Epinions web site by Jack Grieve. They are divided in the following categories, with 25 positive and 25 negative reviews in each category. The classification into positive and negative was based on the "recommended" or "not recommended" tag that the reviewer provided.

    * Books
    * Cars
    * Computers
    * Cookware
    * Hotels
    * Movies
    * Music
    * Phones

This zip file contains the entire corpus in .txt format, structured in directories by product and with file names indicating positive and negative.

The raw corpus, and other types of annotations, are available from the SFU Review Corpus site (http://www.sfu.ca/~mtaboada/research/SFU_Review_Corpus.html). 


2006-2008 Maite Taboada, Jack Grieve, Montana Hay, Patrick Larrivee-Woods

http://www.sfu.ca/~mtaboada/research/nserc-project.html
http://www.sfu.ca/~mtaboada/