The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
We discuss how Case Based Reasoning (CBR) (see e.g. [1], [4]) philosophy of adaptation of some known situations to new similar ones can be realized in rough set framework [5] for c...
We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontolog...
David W. Embley, Douglas M. Campbell, Randy D. Smi...
Graphical components information extraction is a crucial step in the chart recognition and understanding process. However, existing methods of information extraction from chart im...
Automatic extraction of human opinions from Web documents has been receiving increasing interest. To automate the process of opinion extraction, having a collection of evaluative ...