We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...
: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
Microformats and semantic XHTML add semantics to web pages while taking advantage of the existing (X)HTML infrastructure. This approach enables new applications that can be deploy...
This paper addresses the issue of Web document summarization. As textual content of Web documents is often scarce or irrelevant and existing summarization techniques are based on ...
Abstract. There is a significant commercial and research interest in locationbased web search engines. Given a number of search keywords and one or more locations that a user is in...