Sciweavers

472 search results - page 46 / 95
» Crawling the Hidden Web
Sort
View
131
Voted
WWW
2005
ACM
16 years 4 months ago
The infocious web search engine: improving web searching through linguistic analysis
In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities pre...
Alexandros Ntoulas, Gerald Chao, Junghoo Cho
149
Voted
PVLDB
2008
141views more  PVLDB 2008»
15 years 3 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
116
Voted
JCDL
2006
ACM
128views Education» more  JCDL 2006»
15 years 9 months ago
Building a research library for the history of the web
This paper describes the building of a research library for studying the Web, especially research on how the structure and content of the Web change over time. The library is part...
William Y. Arms, Selcuk Aya, Pavel Dmitriev, Blaze...
127
Voted
MM
2004
ACM
112views Multimedia» more  MM 2004»
15 years 9 months ago
Multi-model similarity propagation and its application for web image retrieval
In this paper, we propose an iterative similarity propagation approach to explore the inter-relationships between Web images and their textual annotations for image retrieval. By ...
Xin-Jing Wang, Wei-Ying Ma, Gui-Rong Xue, Xing Li
102
Voted
LAWEB
2003
IEEE
15 years 8 months ago
On the Evolution of Clusters of Near-Duplicate Web Pages
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Dennis Fetterly, Mark Manasse, Marc Najork