This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Software engineers face a difficult task in managing the many different types of relationships that exist between the documents of a software development project. We refer to this...
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar d...
Given an author-conference network that evolves over time, which are the conferences that a given author is most closely related with, and how do they change over time? Large time...
Hanghang Tong, Spiros Papadimitriou, Philip S. Yu,...
How do real, weighted graphs change over time? What patterns, if any, do they obey? Earlier studies focus on unweighted graphs, and, with few exceptions, they focus on static snap...