As part of a project on discourse parsing, we have built a discourse segmenter based on syntactic and lexical information. A discourse segmenter takes text as input, and  produces as output the minimal discourse units in the text.

Our definition of 'minimal discourse unit' is directly inspired by Rhetorical Structure Theory. A discourse unit is:

  • An independent clause
  • A clause in an adjunct relation

SLSeg is described in the following paper:

Here, you can download the entire program, which includes the following resources:

  • A list of clause-like phrases that are in fact discourse markers (e.g., if you will, mind you).
  • A list of verbs used in to-infinitival and if complement clauses that should not be treated as separate discourse segments (e.g., decide in I decided to leave the car at home).
  • A list of unambiguous lexical cues for segment boundary insertion.
  • A list of attributive/cognitive verbs (e.g., think, said) which are used to prevent segmentation of floating attributive clauses.

Download SLSeg