Sciweavers

SIGIR
2003
ACM

Domain-independent text segmentation using anisotropic diffusion and dynamic programming

14 years 5 months ago
Domain-independent text segmentation using anisotropic diffusion and dynamic programming
This paper presents a novel domain-independent text segmentation method, which identifies the boundaries of topic changes in long text documents and/or text streams. The method consists of three components: As a preprocessing step, we eliminate the document-dependent stop words as well as the generic stop words before the sentence similarity is computed. This step assists in the discrimination of the sentence semantic information. Then the cohesion information of sentences in a document or a text stream is captured with a sentence-distance matrix with each entry corresponding to the similarity between a sentence pair. The distance matrix can be represented with a gray-scale image. Thus, a text segmentation problem is converted into an image segmentation problem. We apply the anisotropic diffusion technique to the image representation of the distance matrix to enhance the semantic cohesion of sentence topical groups as well as sharpen topical boundaries. At last, the dynamic programm...
Xiang Ji, Hongyuan Zha
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Where SIGIR
Authors Xiang Ji, Hongyuan Zha
Comments (0)