Extracting knowledge from unstructured text is a long-standing goal of NLP. Although learning approaches to many of its subtasks have been developed (e.g., parsing, taxonomy induc...
Every piece of textual data is generated as a method to convey its authors' opinion regarding specific topics. Authors deliberately organize their writings and create links, ...
Huajing Li, Zaiqing Nie, Wang-Chien Lee, C. Lee Gi...
Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, ...
Mixture models form one of the most widely used classes of generative models for describing structured and clustered data. In this paper we develop a new approach for the analysis...
A technique is presented that uses visual relationships between word images in a document to improve the recognition of the text it contains. This technique takes advantage of the...