Auditory data provide many contextual cues about the crucial content of environments around. The goal of audio based context recognition is to equip the sensing devices with classification algorithms that can automatically classify the environments into pre-defined classes according to the extracted auditory features. In this paper, we first extract various features from the audio signals. We then perform a feature analysis to identify a feature ensemble to optimally classify different contexts. To achieve an efficient and timely online classification, a coarse-to-fine training scheme is adopted, where for each context three HMMs are trained by feature ensembles of different complexities. During online recognition, we start with coarse HMMs (with fewest numbers of features) and progressively apply finer models if necessary. Experiments show that this strategy results in significant saving in computational power with only negligible lose in context recognition accuracy.