The trend in information retrieval systems is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering syste...
Inverse document frequency (IDF) is one of the most useful and widely used concepts in information retrieval. There have been various attempts to provide theoretical justification...
The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standar...
Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is nec...
Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can b...
Monika Rauch Henzinger, Bay-Wei Chang, Brian Milch...
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of rev...