Usual cache optimisation techniques for high performance computing are difficult to apply in embedded VLIW applications. First, embedded applications are not always well structured, and few regular loop nests exist. Real world applications in embedded computing contain hot loops with pointers, indirect arrays accesses, function calls, indirect function calls, non constant stride accesses, etc. Consequently, loop transformations [12] for reducing cache misses are impossible to apply, especially at the back-end level. Second, the strides of memory accesses do not appear to be constant at source code level, because of indirect accesses. Hence, usual prefetching techniques are not applicable. Third, embedded VLIW processors are ”cheap” products, they have limited hardware dynamic mechanisms compared to high performance processors [8]: no out-oforder execution, reduced memory hierarchies, small direct mapped caches, lower clock frequencies, etc. Consequently, the code optimisations me...