Embedded real-time multimedia applications usually imply data parallel processing. SIMD processors embedded in SOCs are cost-effective to exploit the underlying parallelism. However, programming applications for SIMD targets requires data placement and operation scheduling which have been proven to be NP-complete problems and beyond present compiler abilities. In this paper we show how our tool (based on concurrent constraint programming) can be used to explore the design space of a kernel in H.264 standard (video compression). Different cost functions are considered (e.g. execution time, memory occupancy, chip cost ...) to derive different source codes from the same functional specification. Future work includes model refinement as well as full code generation for rapid prototyping of such hardware and software intensive systems.