Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems ...
Daniel M. Dunlavy, Timothy M. Shead, Eric T. Stant...
This paper presents a survey on the major approaches to search interface understanding. The Deep Web consists of data that exist on the Web but are inaccessible via text search en...
Keyword indices, topic directories, and link-based rankings are used to search and structure the rapidly growing Web today. Surprisingly little use is made of years of browsing ex...
Abstract. When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. Ho...
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...