This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a...
This paper presents a dynamic approach to document page segmentation based on inter-component relationships and their local features. State-of-the art page segmentation algorithms...
Probabilistic Latent Semantic Analysis (PLSA) has become a popular topic model for image clustering. However, the traditional PLSA method considers each image (document) independen...
Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special p...
Abstract. A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed. A simplicial complex is topologically...