This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...
The presentation of search results on the web has been dominated by the textual form of document representation. On the other hand, the document's visual aspects such as the ...
Abstract. In this paper the suitability of different document representations for automatic document classification is compared, investigating a whole range of representations be...
Information Retrieval (IR) systems are built with different goals in mind. Some IR systems target high precision that is to have more relevant documents on the first page of their...
Several information organization, access, and filtering systems can benefit from different kind of document representations than those used in traditional Information Retrieval (I...
In this paper we describe work relating to classification of web documents using a graph-based model instead of the traditional vector-based model for document representation. We ...
Adam Schenker, Mark Last, Horst Bunke, Abraham Kan...
Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Index...
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two ...
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, ...