— The extension approach of frequent itemset mining can be applied to discover the relations among documents. Several schemes, i.e., n-gram, stemming, stopword removal and term w...
−Document clustering has become an increasingly important task in analyzing huge numbers of documents distributed among various sites. The challenging aspect is to analyze this e...
The latent topic model plays an important role in the unsupervised learning from a corpus, which provides a probabilistic interpretation of the corpus in terms of the latent topic...
In this paper we propose to define a measure of visual similarity to compare different pages in a corpus. This measure is based on the analysis of the visual layout saliency of th...
Indexing file systems is a powerful means of helping users locate documents, software, and other types of data among large repositories. In environments that contain many differen...