A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web si...
Craig A. Knoblock, Kristina Lerman, Steven Minton,...
Recent kernel-based PPI extraction systems achieve promising performance because of their capability to capture structural syntactic information, but at the expense of computation...
Georeferenced information is growing every day, and geographical information systems are becoming crucial in many decision processes. As a consequence, extracting knowledge from G...
This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents. The heuristics starts from an initial set of basic content elements an...
— In this paper, we propose a new architecture that can extract information of numerous objects in an image at highspeed. Various characteristics can be obtained from the image m...