Trough detection parameters

The trough detection algorithm's parameter, a percentage value, dictates its sensitivity. A trough will be marked as a potential topic break if it contains `hills' on both sides which rise more than this value as a percentage of the total range of the graph.

The challenge is to tune its sensitivity sufficiently to locate as many topics as possible without detecting false positives (or without detecting too many). Due to the nature of the input data, for a given sensitivity setting some dialogues produce a large number of detected topic breaks, while others produce none at all (see chapter 5, including figures 5.9 and 5.8). While the precise reasons for this will be explored in chapters 5 and 6, a guiding assumption during the design of the experiments presented here was that high variation in topic change frequency is acceptable: Some dialogues may change topic more frequently than others.

Experiments were performed using the following sensitivity settings:

The resulting topic detections were compared to the relevant human-annotated documents to determine an optimal level to produce a `clean' conservative set of potential topic breaks. See chapter 6 for a discussion of what false positives from this algorithm mean, and what it cannot detect.

James Ballantine 2005-02-19