In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Today people typically read and annotate printed documents even if they are obtained from electronic sources like digital libraries. If there is a reason for them to share these p...
Abstract— Recent advances in graph-based search techniques derived from Kleinberg’s work [1] have been impressive. This paper further improves the graph-based search algorithm ...
Complex network analysis is a growing research area in a wide variety of domains and has recently become closely associated with data, text and web mining. One of the most active ...
Cristian Klen dos Santos, Alexandre Evsukoff, Beat...
This paper describes a new approach towards detecting plagiarism and scientific documents that have been read but not cited. In contrast to existing approaches, which analyze docu...