Sciweavers

194 search results - page 35 / 39
» Extracting Schema from Semistructured Data
Sort
View
EMNLP
2009
13 years 5 months ago
Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment
Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there ...
Kedar Bellare, Andrew McCallum
WEBI
2007
Springer
14 years 1 months ago
Question Answering over Implicitly Structured Web Content
Implicitly structured content on the Web such as HTML tables and lists can be extremely valuable for web search, question answering, and information retrieval, as the implicit str...
Eugene Agichtein, Chris Burges, Eric Brill
KDD
2007
ACM
193views Data Mining» more  KDD 2007»
14 years 8 months ago
Joint optimization of wrapper generation and template detection
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu
DGO
2007
192views Education» more  DGO 2007»
13 years 9 months ago
D-HOTM: distributed higher order text mining
We present D-HOTM, a framework for Distributed Higher Order Text Mining based on named entities extracted from textual data that are stored in distributed relational databases. Unl...
William M. Pottenger
ICDCS
2002
IEEE
14 years 20 days ago
A Fully Distributed Framework for Cost-Sensitive Data Mining
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach is to apply machine learning algorithms to ...
Wei Fan, Haixun Wang, Philip S. Yu, Salvatore J. S...