The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embed...
Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen,...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Mashups are situational applications that build data flows to link the contents of multiple Web sources. Often times, ranking the results of a mashup is handled in a materializethe...
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
Business intelligence requires the collecting and merging of information from many different sources, both structured and unstructured, in order to analyse for example financial ...