Sciweavers

WIDM
2004
ACM
14 years 6 days ago
Stylistic and lexical co-training for web block classification
Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classifica...
Chee How Lee, Min-Yen Kan, Sandra Lai
WIDM
2004
ACM
14 years 6 days ago
Parsing concurrent XML
Ionut Emil Iacob, Alex Dekhtyar
WIDM
2004
ACM
14 years 6 days ago
Ctree: a compact tree for indexing XML data
In this paper, we propose a novel compact tree (Ctree) for XML indexing, which provides not only concise path summaries at the group level but also detailed child-parent links at ...
Qinghua Zou, Shaorong Liu, Wesley W. Chu
WIDM
2004
ACM
14 years 6 days ago
Next generation CiteSeer
Abstract. CiteSeer began as the first search engine for scientific literature to incorporate Autonomous Citation Indexing, and has since grown to be a well-used, open archive for...
C. Lee Giles
WIDM
2004
ACM
14 years 6 days ago
Measuring similarity between collection of values
In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat...
Carina F. Dorneles, Carlos A. Heuser, Andrei E. N....
WIDM
2004
ACM
14 years 6 days ago
WISE-cluster: clustering e-commerce search engines automatically
In this paper, we propose a new approach to automatically clustering e-commerce search engines (ESEs) on the Web such that ESEs in the same cluster sell similar products. This all...
Qian Peng, Weiyi Meng, Hai He, Clement T. Yu
WIDM
2004
ACM
14 years 6 days ago
User evaluation of the NASA technical report server recommendation service
We present the user evaluation of two recommendation server methodologies implemented for the NASA Technical Report Server (NTRS). One methodology for generating recommendations u...
Michael L. Nelson, Johan Bollen, JoAnne R. Calhoun...
WIDM
2004
ACM
14 years 6 days ago
A version model for supporting adaptation of web pages
Maintenance of large Web sites is a complex task, similar in some sense to software maintenance. Content should be separated from the formatting rules, allowing independent develo...
Rodrigo Giacomini Moro, Renata de Matos Galante, C...
WIDM
2004
ACM
14 years 6 days ago
XPath lookup queries in P2P networks
We address the problem of querying XML data over a P2P network. In P2P networks, the allowed kinds of queries are usually exact-match queries over file names. We discuss the exte...
Angela Bonifati, Ugo Matrangolo, Alfredo Cuzzocrea...
WIDM
2004
ACM
14 years 6 days ago
Probabilistic models for focused web crawling
A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...
Hongyu Liu, Evangelos E. Milios, Jeannette Janssen