Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
Although PageRank has been designed to estimate the popularity of Web pages, it is a general algorithm that can be applied to the analysis of other graphs other than one of hypert...
—The italic detection and slant rectification is a key step of optical character recognition (OCR). In this paper, a novel method is proposed to detect and rectify italic charact...
High-level spoken document analysis is required in many applications seeking access to the semantic content of audio data, such as information retrieval, machine translation or au...
Julien Fayolle, Fabienne Moreau, Christian Raymond...
Search engines that support structured documents typically support structure created by the author (e.g., title, section), and may also support structure added by an annotation pr...