The subject of this paper is the semi-automatic construction of taxonomies over the Web. We address the problem of discovering high-quality resources that belong in a particular n...
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopala...
Indexing schemes for semistructured data have been developed in recent years to optimize path query processing by summarizing path information. However, most of these schemes can ...
In this paper, we propose to use database technology to improve performance of web proxy servers. We view the cache at a proxy server as a web warehouse with data organized in a h...
Web crawlers generate significant loads on Web servers, and are difficult to operate. Instead of running crawlers at many “client” sites, we propose a central crawler and We...
For speed and convenience, applications routinely cache XML data locally, and access it through standard parser (SAX) or tree (DOM) interfaces. When the source of this data is a r...
Dynamic content generation poses huge resource demands on web servers, creating a scalability problem. WebView Materialization, where web pages are cached and constantly refreshed...
Synthetically generated data has always been important for evaluating and understanding new ideas in database research. In this paper, we describe a data generator for generating ...