As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly sup...
We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
Nowadays, images have become widely available on the World Wide Web (WWW). It’s essential to develop effective ways for managing and retrieving such abundant images. Advantageou...
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
As XML has become an emerging standard for information exchange on the World Wide Web, it has gained attention in database communities to extract information from XML seen as a dat...
Tae-Sun Chung, Sangwon Park, Sang-Yong Han, Hyoung...