Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a fixed size “sketch...
While efficient graph-based representations have been developed for modeling combinations of low-level network attacks, relatively little attention has been paid to effective tech...
Steven Noel, Michael Jacobs, Pramod Kalapa, Sushil...
Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users...
Wisam Dakka, Panagiotis G. Ipeirotis, Kenneth R. W...
—We introduce a novel set of social network analysis based algorithms for mining the Web, blogs, and online forums to identify trends and find the people launching these new tren...
Peter A. Gloor, Jonas Krauss, Stefan Nann, Kai Fis...