Sciweavers

466 search results - page 19 / 94
» Scalable Feature Extraction from Noisy Documents
Sort
View
PAKDD
2010
ACM
167views Data Mining» more  PAKDD 2010»
14 years 1 months ago
Resource-Bounded Information Extraction: Acquiring Missing Feature Values on Demand
We present a general framework for the task of extracting specific information “on demand” from a large corpus such as the Web under resource-constraints. Given a database wit...
Pallika Kanani, Andrew McCallum, Shaohan Hu
ICDE
2004
IEEE
117views Database» more  ICDE 2004»
14 years 11 months ago
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient min...
James Caverlee, Ling Liu, David Buttler
WWW
2006
ACM
14 years 10 months ago
POLYPHONET: an advanced social network extraction system from the web
Social networks play important roles in the Semantic Web: knowledge management, information retrieval, ubiquitous computing, and so on. We propose a social network extraction syst...
Hideaki Takeda, Junichiro Mori, Kôiti Hasida...
IJDAR
2002
108views more  IJDAR 2002»
13 years 9 months ago
Document understanding for a broad class of documents
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and ...
Marco Aiello, Christof Monz, Leon Todoran
IPM
2002
106views more  IPM 2002»
13 years 9 months ago
A feature mining based approach for the classification of text documents into disjoint classes
This paper proposes a new approach for classifying text documents into two disjoint classes. The new approach is based on extracting patterns, in the form of two logical expressio...
Salvador Nieto Sánchez, Evangelos Triantaph...