We present a freely available benchmark dataset for audio classification and clustering. This dataset consists of 10 seconds samples of 1886 songs obtained from the Garageband site. Beside the audio clips themselves, textual meta data is provided for the individual songs. The songs are classified into 9 genres. In addition to the genre information, our dataset also consists of 24 hierarchical cluster models created manually by a group of users. This enables a user centric evaluation of audio classification and clustering algorithms and gives researchers the opportunity to test the performance of their methods on heterogeneous data. We first give a motivation for assembling our benchmark dataset. Then we describe the dataset and its elements in more detail. Finally, we present some initial results using a set of audio features generated by a feature construction approach.