Sciweavers

368 search results - page 7 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
DEXAW
1999
IEEE
95views Database» more  DEXAW 1999»
13 years 11 months ago
An XML-Based, 3-Tier Scheme for Integrating Heterogeneous Information Sources to the WWW
The phenomenal growth that the WWW currently experiences necessitates the integration of various types of information sources to its platform. We present an open, extensible multi...
Costas Petrou, Stathes Hadjiefthymiades, Drakoulis...
WWW
2005
ACM
14 years 8 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
DKE
2006
139views more  DKE 2006»
13 years 7 months ago
Information extraction from structured documents using k-testable tree automaton inference
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from structured documents, suc...
Raymond Kosala, Hendrik Blockeel, Maurice Bruynoog...
WWW
2005
ACM
14 years 8 months ago
Interactive web-wrapper construction for extracting relational information from web documents
In this paper, we propose a new user interface to interactively specify Web wrappers to extract relational information from Web documents. In this study, we focused on improving u...
Tsuyoshi Sugibuchi, Yuzuru Tanaka
CORIA
2011
12 years 11 months ago
Mining the Web for lists of Named Entities
Named entities play an important role in Information Extraction. They represent unitary namable information within text. In this work, we focus on groups of named entities of the s...
Arlind Kopliku, Mohand Boughanem, Karen Pinel-Sauv...