Sciweavers

JASIS
2010

Linear time series models for term weighting in information retrieval

13 years 10 months ago
Linear time series models for term weighting in information retrieval
Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this paper is the temporal behavior of terms as a collection changes over time. We propose capturing each term’s collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a term x at time t is predictable by a linear model of the term’s prior observations. On the other hand, a linear time series model for a strong discriminators’ collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
Miles Efron
Added 28 Jan 2011
Updated 28 Jan 2011
Type Journal
Year 2010
Where JASIS
Authors Miles Efron
Comments (0)