Behavioral targeting (BT) leverages historical user behavior to select the ads most relevant to users to display. The state-of-the-art of BT derives a linear Poisson regression model from fine-grained user behavioral data and predicts click-through rate (CTR) from user history. We designed and implemented a highly scalable and efficient solution to BT using Hadoop MapReduce framework. With our parallel algorithm and the resulting system, we can build above 450 BT-category models from the entire Yahoo's user base within one day, the scale that one can not even imagine with prior systems. Moreover, our approach has yielded 20% CTR lift over the existing production system by leveraging the well-grounded probabilistic model fitted from a much larger training dataset. Specifically, our major contributions include: (1) A MapReduce statistical learning algorithm and implementation that achieve optimal data parallelism, task parallelism, and load balance in spite of the typically skewed ...
Ye Chen, Dmitry Pavlov, John F. Canny