Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-...
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining cluste...
Bhavana Bharat Dalvi, William W. Cohen, Jamie Call...
This paper presents ListWebQA, a question answering system that is aimed specifically at extracting answers to list questions exclusively from web snippets. Answers are identifi...
Abstract. Flexibility to react on rapidly changing general conditions of the environment has become a key factor for economic success of any company. The competitiveness of an ente...
The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extracti...
Valter Crescenzi, Giansalvatore Mecca, Paolo Meria...