The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Abstract. A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed. A simplicial complex is topologically...
The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms ...
—Table detection is always an important task of document analysis and recognition. In this paper, we propose a novel and effective table detection method via visual separators an...
Jing Fang, Liangcai Gao, Kun Bai, Ruiheng Qiu, Xin...
Abstract. This paper presents a study of 25 structural features extracted from samples of grapheme `th' that correspond to features commonly used by forensic document examiner...