These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditi...
The rapid growth of the World Wide Web and the Internet has fueled interest in Web services and the Semantic Web, which are quickly becoming important parts of modern electronic c...
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content catego...
Guangyu Zhu, Xiaodong Yu, Yi Li, David S. Doermann
PixED (from Pixel to Electronic Document) is aimed at converting document images into structured electronic documents which can be read by a machine for information retrieval. The...
Abstract. Content retrieval of scanned line drawing images is a difficult problem, especially from real-life large scale databases. Existing algorithms don’t work well due to th...