PixED (from Pixel to Electronic Document) is aimed at converting document images into structured electronic documents which can be read by a machine for information retrieval. The...
This paper focuses on the information management systems of the dynamic World Wide Web. Many individual web pages, such as news portals, provide periodic information and public an...
Postcorrection of OCR-results for text documents is usually based on electronic dictionaries. When scanning texts from a specific thematic area, conventional dictionaries often m...
Christian M. Strohmaier, Christoph Ringlstetter, K...
Abstract. Data intensive information is often published on the internet in the format of HTML tables. Extracting some of the information that is of users’ interest from the inter...
Jixue Liu, Zhuoyun Ao, Ho-Hyun Park, Yongfeng Chen
The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for docum...