In most real-world situations, a single microphone is insufficient for the characterization of an entire auditory scene. This often occurs in places such as office environments which consist of several interconnected spaces that are at least partially acoustically isolated from one another. To this end, we extend our previous work on segmentation of natural sounds to perform scene characterization using a sparse array of microphones, strategically placed to ensure that all parts of the environment are within range of at least one microphone. By accounting for which microphones are active for a given sound event, we perform a multi-channel segmentation that captures sound events occurring in any part of the space. The segmentation is inferred from a custom dynamic Bayesian network (DBN) that models how event boundaries influence changes in audio features. Example recordings illustrate the utility of our approach in a noisy office environment.
Gordon Wichern, Harvey D. Thornburg, Andreas Spani