The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today’s society. The problem spans entire sectors, from scientists...
In this paper we analyze queries and sessions intended to satisfy children’s information needs using a large-scale query log. The aim of this analysis is twofold: i) To identify...
Sergio Duarte Torres, Djoerd Hiemstra, Pavel Serdy...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Randomness is being harnessed in the design of some interactive systems. This is observed in random blogs, random web searching, and in particular Apple's iPod Shuffle. Yet t...
Open Source communities typically use a software repository to archive various software projects with their source code, mailing list discussions, documentation, bug reports, and ...
Shinji Kawaguchi, Pankaj K. Garg, Makoto Matsushit...