In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Micr...
The problem of document classification considers categorizing or grouping of various document types. Each document can be represented as a bag of words, which has no straightforw...
Abstract— With XML becoming a standard for data representation and exchange, we can expect to see large scale repositories and warehouses of XML data. In order for users to under...
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this pape...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
A large amount of handwritten documents exist in image form, as scanned documents. The supporting electronic media allows for better preservation, but to access their content they...
Antonio Clavelli, Luigi P. Cordella, Claudio De St...