Sciweavers

SPIRE
2005
Springer

Deriving TF-IDF as a Fisher Kernel

14 years 4 months ago
Deriving TF-IDF as a Fisher Kernel
The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standard models such as the multinomial distribution. This paper investigates the DCM Fisher kernel, a function for comparing documents derived from the DCM. We show that the DCM Fisher kernel has components that are similar to the term frequency (TF) and inverse document frequency (IDF) factors of the standard TF-IDF method for representing documents. Experiments show that the DCM Fisher kernel performs better than alternative kernels for nearest-neighbor document classification, but that the TF-IDF representation still performs best.
Charles Elkan
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where SPIRE
Authors Charles Elkan
Comments (0)