In this paper, we report the development and experiments of IBM Content Harvester (CH), a tool to analyze and recover templates and content from word processor created text docume...
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
Intel? Mash Maker is an interactive tool that tracks what the user is doing and tries to infer what information and visualizations they might find useful for their current task. M...
Robert Ennals, Eric A. Brewer, Minos N. Garofalaki...
In this paper, we propose a Web image search result organizing method to facilitate user browsing. We formalize this problem as a salient image region pattern extraction problem. ...
As the result of interactions between visitors and a web site, an http log file contains very rich knowledge about users on-site behaviors, which, if fully exploited, can better c...