Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
News reports are being produced and disseminated in overwhelming volume, making it difficult to keep up with the newest information. Most previous research in automatic news organ...
The high cost of locating faults in programs has motivated the development of techniques that assist in fault localization by automating part of the process of searching for fault...
We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of conten...
This paper proposes a novel application of a statistical language model to opinionated document retrieval targeting weblogs (blogs). In particular, we explore the use of the trigg...