We investigate the subtle cues to user identity that may be exploited in attacks on the privacy of users in web search query logs. We study the application of simple classifiers ...
We study the parallelization of the (record) linkage problem – i.e., to identify matching records between two collections of records, A and B. One of main idiosyncrasies of the ...
Index compression techniques are known to substantially decrease the storage requirements of a text retrieval system. As a side-effect, they may increase its retrieval performanc...
The term frequency normalisation parameter sensitivity is an important issue in the probabilistic model for Information Retrieval. A high parameter sensitivity indicates that a sl...
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achi...