The generalized traveling salesman problem (GTSP) is an NPhard problem that extends the classical traveling salesman problem by partitioning the nodes into clusters and looking fo...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
This paper describes an approach to digesting threads of archived discussion lists by clustering messages into approximate topical groups, and then extracting shorter overviews, a...
The aim of process mining is to identify and extract process patterns from data logs to reconstruct an overall process flowchart. As business processes become more and more comple...
We introduce a stricter Web community definition to overcome boundary ambiguity of a Web community defined by Flake, Lawrence and Giles [2], and consider the problem of finding co...