Sciweavers

WWW
2011
ACM

Parallel boosted regression trees for web search ranking

13 years 6 months ago
Parallel boosted regression trees for web search ranking
Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned websearch ranking — a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance. Since this approach is based on data partitioning, and requires a small amount of communication, it generalizes to distributed and shared memory machines, as well as clouds. We present experim...
Stephen Tyree, Kilian Q. Weinberger, Kunal Agrawal
Added 15 May 2011
Updated 15 May 2011
Type Journal
Year 2011
Where WWW
Authors Stephen Tyree, Kilian Q. Weinberger, Kunal Agrawal, Jennifer Paykin
Comments (0)