In this paper, we complement the term frequency, which is used in many bag-of-words based information retrieval models, with information about the semantic relatedness of query and...
In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly;...
Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is nec...
Abstract. Weighting models use lexical statistics, such as term frequencies, to derive term weights, which are used to estimate the relevance of a document to a query. Apart from t...
We study methods to initialize or bias different clustering methods using prior information about the "importance" of a keyword w.r.t. the whole document collection or a...