Exploiting lexical and semantic relationships in large unstructured text collections can significantly enhance managing, integrating, and querying information locked in unstructur...
Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document core...
Jian Huang 0002, Sarah M. Taylor, Jonathan L. Smit...
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...
While multimedia documents are sequentially presented to users, an information filtering (IF) system is useful to achieve a good retrieval performance in terms of both quality and ...
Dianhui Wang, Xiaodi Huang, Yong-Soo Kim, Joon Shi...
We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the stru...