Crowdsourcing platforms offer unprecedented opportunities for creating evaluation benchmarks, but suffer from varied output quality from crowd workers who possess different levels...
Typically, users interact with database systems by formulating queries. However, many times users do not have a clear understanding of their information needs or the exact content...
Meaningful evaluation of web search must take account of spam. Here we conduct a user experiment to investigate whether satisfaction with search engine result pages as a whole is ...
Timothy Jones, David Hawking, Paul Thomas, Ramesh ...
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
With hundreds of millions of participants, social media services have become commonplace. Unlike a traditional social network service, a microblogging network like Twitter is a hy...
As user demands become increasingly sophisticated, search engines today are competing in more than just returning document results from the Web. One area of competition is providi...
We introduce a generative probabilistic document model based on latent Dirichlet allocation (LDA), to deal with textual errors in the document collection. Our model is inspired by...
We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document a...
We study the problem of detecting coordinated free text campaigns in large-scale social media. These campaigns – ranging from coordinated spam messages to promotional and advert...
Kyumin Lee, James Caverlee, Zhiyuan Cheng, Daniel ...