This paper describes TextTiling,an algorithmfor partitioning expository texts into coherent multi-paragraph discourse units which re ect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Twofully-implementedversions ofthe algorithmare described and shown to produce segmentation that corresponds well to human judgments of the major subtopic boundaries of thirteen lengthy texts.
Marti A. Hearst