Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology do not allow for an efficient exploitation of the inherent data parallelism available in many signal processing and multimedia applications. In this paper, we have explored the automatic vectorization of embedded applications. In particular, we have focused on algorithms in which the same computations are applied over a set of signals that are being processed simultaneously. Usually this set of signals is represented as a 2D array in which each row is an input signal that has to be filtered in some way. A motivating example, inspired by VoIP processing, illustrates that state-of-the-art vectorizing compilers inefficiently exploit the data parallelism inherent to this kind of applications. One of the main reasons behind this, is that they present inner loops that carry all the dependencies and external loops...