This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
We consider the problem of ranking refinement, i.e., to improve the accuracy of an existing ranking function with a small set of labeled instances. We are, particularly, intereste...
This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from larges...
Most current banner advertising is sold through negotiation thereby incurring large transaction costs and possibly suboptimal allocations. We propose a new automated system for se...
Uriel Feige, Nicole Immorlica, Vahab S. Mirrokni, ...
Hypothesis generation is a crucial initial step for making scientific discoveries. This paper addresses the problem of automatically discovering interesting hypotheses from the we...
Typical web sessions can be hijacked easily by a network eavesdropper in attacks that have come to be designated "sidejacking." The rise of ubiquitous wireless networks,...
This paper describes a series of user studies on how people use the Web via mobile devices. The data primarily comes from contextual inquiries with 47 participants between 2004 an...
The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia inf...
Wikis have proven to be a valuable tool for collaboration and content generation on the web. Simple semantics and ease-of-use make wiki systems well suited for meeting many emergi...
Ontology population is prone to cause inconsistency because the populating process is imprecise or the populated data may conflict with the original data. By assuming that the int...