Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...
Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded process...
Ramaswamy Govindarajan, Erik R. Altman, Guang R. G...
Basic retiming is an algorithm originally developed for hardware optimization. Software pipelining is a technique proposed to increase instruction-level parallelism for parallel p...
—The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. This model allows programmers to sp...
Abhishek Udupa, R. Govindarajan, Matthew J. Thazhu...
Abstract--Multimedia and DSP applications have several computationally intensive kernels which are often offloaded and accelerated by application-specific hardware. This paper pres...
Sejong Oh, Tag Gon Kim, Jeonghun Cho, Elaheh Bozor...