Typically, users interact with database systems by formulating queries. However, many times users do not have a clear understanding of their information needs or the exact content...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
We study the problem of detecting coordinated free text campaigns in large-scale social media. These campaigns – ranging from coordinated spam messages to promotional and advert...
Kyumin Lee, James Caverlee, Zhiyuan Cheng, Daniel ...
As user demands become increasingly sophisticated, search engines today are competing in more than just returning document results from the Web. One area of competition is providi...
Crowdsourcing platforms offer unprecedented opportunities for creating evaluation benchmarks, but suffer from varied output quality from crowd workers who possess different levels...