To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely f...
Wisam Dakka, Luis Gravano, Panagiotis G. Ipeirotis
In the past, quite a few fast algorithms have been developed to mine frequent patterns over graph data, with the large spectrum covering many variants of the problem. However, the...
Recent growth of social classification systems due to steadily increasing popularity has established a multitude of heterogeneous isolated, non-integrated, and non-interoperable t...
Topic models such as aspect model or LDA have been shown as a promising approach for text modeling. Unlike many previous models that restrict each document to a single topic, topi...