The designs of high-performance processor architectures are moving toward the integration of a large number of multiple processing cores on a single chip. The IBM Cyclops-64 (C64)...
In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p strea...
For transactional memory (TM) to achieve widespread acceptance, transactions should not be limited to the physical resources of any specific hardware implementation. TM systems s...
JaeWoong Chung, Chi Cao Minh, Austen McDonald, Tra...
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...
Automatic circuit placement has received renewed interest recently given the rapid increase of circuit complexity, increase of interconnect delay, and potential sub-optimality of ...