Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We pr...
Xiaonan Lu, James Ze Wang, Prasenjit Mitra, C. Lee...
Large collections of scanned documents (books and journals) are now available in Digital Libraries. The most common method for retrieving relevant information from these collectio...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Abstract. This paper describes the setup of the Book Structure Extraction competition run at ICDAR 2009. The goal of the competition was to evaluate and compare automatic technique...
Antoine Doucet, Gabriella Kazai, Bodin Dresevic, A...
Creating, maintaining, or using a digital library requires the manipulation of digital documents. Information workspaces provide a visual representation allowing users to collect,...
Frank M. Shipman III, Hao-wei Hsieh, J. Michael Mo...