The problem of identifying approximately duplicate objects in databases is an essential step for the information integration process. Most existing approaches have relied on gener...
Click data captures many users’ document preferences for a query and has been shown to help significantly improve search engine ranking. However, most click data is noisy and of...
The problem of combining the ranked preferences of many experts is an old and surprisingly deep problem that has gained renewed importance in many machine learning, data mining, a...
—A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new “normalized in...
Ming Li, Xin Chen, Xin Li, Bin Ma, Paul M. B. Vit&...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...