This paper reports a statistical identification technique that differentiates scripts and languages in degraded and distorted document images. We identify scripts and languages th...
This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more cate...
Management and retrieval of large volumes of text can be expensive in both space and time. Moreover, the range of document sizes in a large collection such as trec presents difficu...
Alistair Moffat, Ron Sacks-Davis, Ross Wilkinson, ...
— This paper presents a graphical approach to model XML documents based on a Data Type Documentation called Graphical Notations-Data Type Documentation (GN-DTD). GN-DTD allows us...
With the WEBSOM method a textual document collection may be organized onto a graphical map display that provides an overview of the collection and facilitates interactive browsing...
Samuel Kaski, Timo Honkela, Krista Lagus, Teuvo Ko...