Conventionally, Web pages have been recognized as documents described by HTML. Image data, such as photographs, logos, maps, illustrations, and decorated text, have been treated a...
In a Web database that dynamically provides information in response to user queries, two distinct schemas, interface schema (the schema users can query) and result schema (the sch...
Jiying Wang, Ji-Rong Wen, Frederick H. Lochovsky, ...
This paper outlines our approach to the creation of annotated corpora for the purposes of Web Information Extraction, and presents the Web Annotation tool. This tool enables the a...
We describe a browser for the past web. It can retrieve data from multiple past web resources and features a passive browsing style based on change detection and presentation. The...
Adam Jatowt, Yukiko Kawai, Satoshi Nakamura, Yutak...
In this paper, we undertake a large-scale study of online user behavior based on search and toolbar logs. We propose a new CCS taxonomy of pageviews consisting of Content (news, p...