Topic Segmentation in Spoken Dialogue

James Ballantine

james at

June 2004

This is a browsable version. The orignal format, including appendices, is available in the PDF version.


Topic segmentation is the division of linguistic data into semantically coherent blocks, based on the topics they cover. Traditional topic segmentation techniques focus on written text; spoken dialogue presents different challenges such as lack of paragraph markings, different lexical topic-change cues, and lack of organised structure.

This thesis first describes the implementation and evaluation of existing techniques for the special case of transcripts of spoken dialogue. Secondly, it describes the conduct and evaluation of experiments in improving topic segmentation results using spoken dialogue domain-specific cues and heuristic topic-change detection.

James Ballantine 2005-02-19