When text categorization is applied to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. In this paper, we...
In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized sys...
Ricardo A. Baeza-Yates, Carlos Castillo, Flavio Ju...
The web's hyperlinks are notoriously brittle, and break whenever a resource migrates. One solution to this problem is a transparent resource migration mechanism, which separa...
Abstract: Peer data management systems (PDMS) are the natural extension of integrated information systems. Conventionally, a single integrating system manages an integrated schema,...
Ralf Heese, Sven Herschel, Felix Naumann, Armin Ro...
Java applets have been used increasingly on web sites to perform client-side processing and provide dynamic content. While many web site analysis tools are available, their focus ...
Jeffrey L. Korn, Yih-Farn Chen, Eleftherios Koutso...