We combine techniques of XML Mining and Text Mining for the benefit of Information Retrieval. By manipulating the word sequence according to the XML structure of the marked-up tex...
The World Wide Web revolutionized the use of forms in everyday private and business life by allowing a move away from paper forms to easily accessible digital forms. Data captured...
Stijn Dekeyser, Jan Hidders, Richard Watson, Ron A...
Abstract. The paper describes HıLεX, a new ASP-based system for the extraction of information from unstructured documents. Unlike previous systems, which are mainly syntactic, HÄ...
Massimo Ruffolo, Nicola Leone, Marco Manna, Domeni...
This paper describes how use the HTMLEditorKit to perform web data mining on stock statistics for listed firms. Our focus is on making use of the web to get information about comp...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...