Tracking new topics, ideas, and "memes" across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long ...
Corruption of data by class-label noise is an important practical concern impacting many classification problems. Studies of data cleaning techniques often assume a uniform label ...
This paper explores an important and relatively unstudied quality measure of a sponsored search advertisement: bounce rate. The bounce rate of an ad can be informally defined as t...
D. Sculley, Robert G. Malkin, Sugato Basu, Roberto...
We introduce a new approach to analyzing click logs by examining both the documents that are clicked and those that are bypassed--documents returned higher in the ordering of the ...
Atish Das Sarma, Sreenivas Gollapudi, Samuel Ieong
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-...