Sciweavers

376 search results - page 53 / 76
» A Hybrid Machine Learning Approach for Information Extractio...
Sort
View
DOCENG
2009
ACM
14 years 2 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
DGO
2008
128views Education» more  DGO 2008»
13 years 9 months ago
Ontology generation for large email collections
This paper presents a new approach to identifying concepts expressed in a collection of email messages, and organizing them into an ontology or taxonomy for browsing. It incorpora...
Hui Yang, Jamie Callan
WWW
2007
ACM
14 years 8 months ago
U-REST: an unsupervised record extraction system
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
Yuan Kui Shen, David R. Karger
IEAAIE
2001
Springer
14 years 1 days ago
Selecting a Relevant Set of Examples to Learn IE-Rules
The growing availability of online text has lead to an increase in the use of automatic knowledge acquisition approaches from textual data, as in Information Extraction (IE). Some ...
Jordi Turmo, Horacio Rodríguez
ACL
2006
13 years 9 months ago
A Collaborative Framework for Collecting Thai Unknown Words from the Web
We propose a collaborative framework for collecting Thai unknown words found on Web pages over the Internet. Our main goal is to design and construct a Webbased system which allow...
Choochart Haruechaiyasak, Chatchawal Sangkeettraka...