While supervised learning approaches for 3D shape retrieval have been successfully used to incorporate human knowledge about object classes based on global shape features, the incorporation of local features still remains a difficult task. First, it is not obvious how to measure the similarity between two objects each represented by a set of local features, and second, it is not clear how to choose local feature scales such that they are most distinctive. In this paper, we tackle both of these problems and present a supervised learning approach that uses arbitrary local features for 3D shape retrieval. It avoids the problem of establishing feature correspondences and automatically detects discriminating feature scales. Our experiments on the Princeton Shape Benchmark show that our method is superior to state-of-the-art shape retrieval techniques.