Abstract—We present the query-by-description (QBD) component of “Kandem,” a time-aware music retrieval system. The QBD system we describe learns a relation between descriptive text concerning a musical artist and their actual acoustic output, making such queries as “Play me something loud with an electronic beat” possible by merely analyzing the audio content of a database. We show a novel machine learning technique based on Regularized Least-Squares Classification (RLSC) that can quickly and efficiently learn the non-linear relation between descriptive language and audio features by treating the problem as a large number of possible output classes linked to the same set of input features. We show how the RLSC training can easily eliminate irrelevant labels.
Brian Whitman, Ryan M. Rifkin