Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the ...
This paper describes how use the HTMLEditorKit to perform web data mining on EDGAR (Electronic Data-Gathering, Analysis, and Retrieval system). EDGAR is the SEC's (U.S. Secur...
In this work we try to bridge the gap often encountered by researchers who find themselves with few or no labeled examples from their desired target domain, yet still have access ...
In the paper we present a methodology for the semiautomated extraction of ontological knowledge from XML data sources in a given domain. We consider an interconnection scenario ov...
Silvana Castano, Valeria De Antonellis, Sabrina De...
Anchor text has been shown to be effective in ranking[6] and a variety of information retrieval tasks on web pages. Some authors have expanded on anchor text by using the words ar...