Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregat...
— The number of XML documents produced and available on the Internet is steadily increasing. It is thus important to devise automatic procedures to extract useful information fro...
Francesca Trentini, Markus Hagenbuchner, Alessandr...
Abstract— In this paper we suggest a new approach to represent text document collections, integrating background knowledge to improve clustering effectiveness. Background knowled...
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this pape...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
This paper proposes a novel approach to measuring XML document similarity by taking into account the semantics between XML elements. The motivation of the proposed approach is to ...