In this paper we address the problem of temporal alignment applied to captured communicative gestures conveying different styles. We propose a representation space that may be considered as robust to the spatial variability induced by style. By extending a multilevel dynamic time warping algorithm, we show how this extension can fulfil the goals of time correspondence between gesture sequences while preventing jerkiness introduced by standard time warping methods.