Maite Taboada
SLSeg - A Syntactic and Lexical-Based Discourse Segmenter

As part of a project on discourse parsing, we have built a discourse segmenter based on syntactic and lexical information. A discourse segmenter takes text as input, and produces produces as output the minimal discourse units in the text.

Our definition of 'minimal discourse unit' is directly inspired by Rhetorical Structure Theory. A discourse unit is:

  • An independent clause
  • A clause in an adjunct relation

SLSeg is described in the following paper:

Here, you can download the entire program, which includes the following resources:

  • A list of clause-like phrases that are in fact discourse markers (e.g., if you will, mind you).
  • A list of verbs used in to-infinitival and if complement clauses that should not be treated as separate discourse segments (e.g., decide in I decided to leave the car at home).
  • A list of unambiguous lexical cues for segment boundary insertion.
  • A list of attributive/ cognitive verbs (e.g., think, said) which are used to prevent segmentation of floating attributive clauses.

Download SLSeg (please enter your name and e-mail address, so that we can keep you updated of new releases).

©2009 Maite Taboada, Milan Tofiloski, Julian Brooke

 
 go to top