Many digital sound archives still suffer from tremendous problems concerning access. Materials are often in different formats, with related media in separate collections, and with...
Graphics and vision are approximate inverses of each other: ordinarily Graphics Processing Units (GPUs) are used to convert “numbers into pictures” (i.e. computer graphics). I...
We consider how to exploit the correlation in image for compression by virtue of studying image patches in a nonparametric manner. Instead of extracting and recording parameters, ...
This work introduces a novel data mining scheme, spatial pyramid mining, to discover association rules at multiple resolutions in order to identify frequent spatial configuration...
Multiview video coding (MVC) is currently being standardized by the Joint Video Team as an extension of H264/AVC. When an MVC bitstream is decoded, some views (named target views)...
Ying Chen, Ye-Kui Wang, Miska M. Hannuksela, Monce...
We address the problem of keyword spotting in continuous speech streams when training and testing conditions can be different. We propose a keyword spotting algorithm based on spa...
Several stochastic models provide an effective framework to identify the temporal structure of audiovisual data. Most of them need as input a first video structure, i.e. connecti...
We propose a method for separating accompaniment from polyphonic music and its karaoke application, both based on automatic melody transcription. First, the method transcribes the...
Visual and auditory forms have some noticeable associations that can inspire similar cognitive and aesthetical experiences. This paper presents a study on the possibilities of app...