This paper describes a new approach towards detecting plagiarism and scientific documents that have been read but not cited. In contrast to existing approaches, which analyze docu...
This paper addresses the indexing and retrieval of mathematical symbols from digitized documents. The proposed approach exploits Shape Contexts (SC) to describe the shape of mathe...
Although clustering under constraints is a current research topic, a hierarchical setting, in which a hierarchy of clusters is the goal, is usually not considered. This paper trie...
In this paper, we propose a new approach to automatically clustering e-commerce search engines (ESEs) on the Web such that ESEs in the same cluster sell similar products. This all...
This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristi...