A growing trend in commercial search engines is the display of specialized content such as news, products, etc. interleaved with web search results. Ideally, this content should b...
A major obstacle to the construction of a probabilistic translation model is the lack of large parallel corpora. In this paper we first describe a parallel text mining system that...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retri...
We propose a Web search site called "Cyclone", in which a user can retrieve encyclopedic term descriptions on the Web. Cyclone searches the Web for headwords and page fr...
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the...