In this paper, we propose a method for mediatory summarization, which is a novel technique for facilitating users' assessments of the credibility of information on the Web. A...
In this paper we propose a completely unsupervised method for open-domain entity extraction and clustering over query logs. The underlying hypothesis is that classes defined by mi...
Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using do...
Jian Huang 0002, Pucktada Treeratpituk, Sarah M. T...
This is a system demo for a set of tools for translating texts between multiple languages in real time with high quality. The translation works on restricted languages, and is bas...
WebScript is a scripting language for processing Web documents. Designed as an extension to Jacl, the Java implementation of Tcl, WebScript allows programmers to manipulate HTML i...
The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the cont...
The paper presents an approach to classifying Web documents into large topic ontology. The main emphasis is on having a simple approach appropriate for handling a large ontology an...
Prolog is an excellent tool for representing and manipulating data written in formal languages as well as natural language. Its safe semantics and automatic memory management make...
Jan Wielemaker, Zhisheng Huang, Lourens van der Me...
One of the most important aspects of a Web document is its up-to-dateness or recency. Up-to-dateness is particularly relevant to Web documents because they usually contain content...
SimRank has been proposed to rank web documents based on a graph model on hyperlinks. The existing techniques for conducting SimRank computation adopt an iteration computation para...