Stop word detection is attempted in this work in the context of retrieval of document images in the compressed domain. Algorithms are presented to identify text lines and words an...
Interpreting legacy XML documents is a great challenge for realizing the vision of the Semantic Web (SW). This paper presents an algorithm to transform XML data into RDF- foundati...
: Development of documents in multiple media involves activities in three different fields, the technical, the discoursive and the procedural. The major development problems of art...
Documents often have inherently parallel structure: they may consist of a text and ries, or an abstract and a body, or parts presenting alternative views on the same problem. Reve...
In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured cla...