Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
A user-centric entity detection system is one in which the primary consumer of the detected entities is a person who can perform actions on the detected entities (e.g. perform a s...
The classical (ad hoc) document retrieval problem has been traditionally approached through ranking according to heuristically developed functions (such as tf.idf or bm25) or gene...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
Background: Extracting Protein-Protein Interactions (PPI) from research papers is a way of translating information from English to the language used by the databases that store th...