Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
Static index pruning techniques aim at removing from the posting lists of an inverted file the references to documents which are likely to be not relevant for answering user querie...
The World Wide Web is a large, heterogeneous, distributedcollectionof documents connected by hypertext links. The most common technologycurrently used for searching the Web depend...
Alberto O. Mendelzon, George A. Mihaila, Tova Milo
This paper describes the development of a polyphonic music retrieval system with the n-gram approach. Musical n-grams are constructed from polyphonic musical performances in MIDI ...
Numerous approaches, including textual, structural and featural, to detecting duplicate documents have been investigated. Considering document images are usually stored and transm...