Sciweavers

368 search results - page 16 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
SIGIR
2008
ACM
13 years 7 months ago
XML-aided phrase indexing for hypertext documents
We combine techniques of XML Mining and Text Mining for the benefit of Information Retrieval. By manipulating the word sequence according to the XML structure of the marked-up tex...
Miro Lehtonen, Antoine Doucet
ADC
2006
Springer
139views Database» more  ADC 2006»
14 years 1 months ago
Peer-to-peer form based web information systems
The World Wide Web revolutionized the use of forms in everyday private and business life by allowing a move away from paper forms to easily accessible digital forms. Data captured...
Stijn Dekeyser, Jan Hidders, Richard Watson, Ron A...
ASP
2005
Springer
13 years 9 months ago
Exploiting ASP for Semantic Information Extraction
Abstract. The paper describes HıLεX, a new ASP-based system for the extraction of information from unstructured documents. Unlike previous systems, which are mainly syntactic, HÄ...
Massimo Ruffolo, Nicola Leone, Marco Manna, Domeni...
JOT
2008
136views more  JOT 2008»
13 years 7 months ago
The Stock Statistics Parser
This paper describes how use the HTMLEditorKit to perform web data mining on stock statistics for listed firms. Our focus is on making use of the web to get information about comp...
Douglas Lyon
DOCENG
2009
ACM
14 years 1 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan