Sciweavers

2179 search results - page 300 / 436
» Indexing Shared Content in Information Retrieval Systems
Sort
View
CIKM
2010
Springer
13 years 6 months ago
Fast dimension reduction for document classification based on imprecise spectrum analysis
This paper proposes an algorithm called Imprecise Spectrum Analysis (ISA) to carry out fast dimension reduction for document classification. ISA is designed based on the one-sided...
Hu Guan, Bin Xiao, Jingyu Zhou, Minyi Guo, Tao Yan...
CIKM
2009
Springer
14 years 2 months ago
Topic and keyword re-ranking for LDA-based topic modeling
Topic-based text summaries promise to help average users quickly understand a text collection and derive insights. Recent research has shown that the Latent Dirichlet Allocation (...
Yangqiu Song, Shimei Pan, Shixia Liu, Michelle X. ...
SIGIR
2010
ACM
14 years 6 days ago
Adaptive near-duplicate detection via similarity learning
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
CIKM
2006
Springer
14 years 1 days ago
Performance thresholding in practical text classification
In practical classification, there is often a mix of learnable and unlearnable classes and only a classifier above a minimum performance threshold can be deployed. This problem is...
Hinrich Schütze, Emre Velipasaoglu, Jan O. Pe...
CIKM
2008
Springer
13 years 10 months ago
Combining concept hierarchies and statistical topic models
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can p...
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyv...