In this paper we present a methodology to extract information from the Web to build a taxonomy of terms and Web resources for a given domain. This taxonomy represents a hierarchy o...
In this paper we explore the use of parsimonious language models for web retrieval. These models are smaller thus more efficient than the standard language models and are therefor...
The prevailing model for digital preservation is that archives should be similar to a “fortress”: a large, protective infrastructure built to defend a relatively small collect...
—“Big Data” in map-reduce (M-R) clusters is often fundamentally temporal in nature, as are many analytics tasks over such data. For instance, display advertising uses Behavio...
Badrish Chandramouli, Jonathan Goldstein, Songyun ...
Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned websearch ranking — a domain notorious for very large data sets. ...
Stephen Tyree, Kilian Q. Weinberger, Kunal Agrawal...