The dynamic marketplace in online advertising calls for ranking systems that are optimized to consistently promote and capitalize better performing ads. The streaming nature of online data inevitably makes an advertising system choose between maximizing its expected revenue according to its current knowledge in short term (exploitation) and trying to learn more about the unknown to improve its knowledge (exploration), since the latter might increase its revenue in the future. The exploitation and exploration (EE) tradeoff has been extensively studied in the reinforcement learning community, however, not been paid much attention in online advertising until recently. In this paper, we develop two novel EE strategies for online advertising. Specifically, our methods can adaptively balance the two aspects of EE by automatically learning the optimal tradeoff and incorporating confidence metrics of historical performance. Within a deliberately designed offline simulation framework we apply ...