We study how to best use crowdsourced relevance judgments learning to rank [1, 7]. We integrate two lines of prior work: unreliable crowd-based binary annotation for binary classification [5, 3] and aggregating graded relevance judgments from reliable experts for ranking [7]. To model varying performance of the crowd, we simulate annotation noise with varying magnitude and distributional properties. Evaluation on three LETOR test collections reveals a striking trend contrary to prior studies: single labeling outperforms consensus methods in maximizing learner accuracy relative to annotator effort. We also see surprising consistency of the learning curve across noise distributions, as well as greater challenge with the adversarial case for multi-class labeling. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval] General Terms Algorithms, Design, Experimentation, Performance Keywords Crowdsourcing, learning to rank, active learning