Distributed local memories, or scratchpads, have been shown to effectively reduce cost and power consumption of application-specific accelerators while maintaining performance. Th...
Manjunath Kudlur, Kevin Fan, Michael L. Chu, Scott...
Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedd...
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
Systolic implementations of dynamic programming solutions that utilize a similarity matrix can achieve appreciable performance with both course- and fine-grain parallelization. A ...
The emergence of multicore architectures and highly scalable platforms motivates the development of novel algorithms and techniques that emphasize concurrency and are tolerant of ...