Sciweavers

COLING
2000

A Method of Measuring Term Representativeness - Baseline Method Using Co-occurrence Distribution

14 years 1 months ago
A Method of Measuring Term Representativeness - Baseline Method Using Co-occurrence Distribution
This paper introduces a scheme, which we call the baseline method, to define a measure of term representativeness and measures defined by using the scheme. The representativeness of a term is measured by a normalized characteristic value defined for a set of all documents that contain the term. Normalization is done by comparing the original characteristic value with the characteristic value defined for a randomly chosen document set of the same size. The latter value is estimated by a baseline function obtained by random sampling and logarithmic linear approximation. We found that the distance between the word distribution in a document set and the word distribution in a whole corpus is an effective characteristic value to use for the baseline method. Measures defined by the baseline method have several advantages including that they can be used to compare the representativeness of two terms with very different frequencies, and that they have well-defined threshold values of being re...
Toru Hisamitsu, Yoshiki Niwa, Jun-ichi Tsujii
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Where COLING
Authors Toru Hisamitsu, Yoshiki Niwa, Jun-ichi Tsujii
Comments (0)