The Sort operation is a core part of many critical applications. Despite the large efforts to parallelize it, the fact that it suffers from high data-dependencies vastly limits it...
Layali K. Rashid, Wessam Hassanein, Moustafa A. Ha...
Abstract--The centralized structures necessary for the extraction of instruction-level parallelism (ILP) are consuming progressively smaller portions of the total die area of chip ...
—The number of CPUs in chip multiprocessors is growing at the Moore’s Law rate, due to continued technology advances. However, new technologies pose serious reliability challen...
This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors (CMP) and its corresponding pre-layout simulation results using VHDL. T...
- Bioinformatics applications such as gene and protein sequence matching algorithms are characterized by the need to process large amounts of data. While uni-processor performance ...
Ravi Kiran Karanam, Arun Ravindran, Arindam Mukher...
This paper presents the FPGA implementation of the prototype for the Data-Driven Chip-Multiprocessor (D2-CMP). In particular, we study the implementation of a Thread Synchronizati...
Prior research shows that database system performance is dominated by off-chip data stalls, resulting in a concerted effort to bring data into on-chip caches. At the same time, hi...
Nikos Hardavellas, Ippokratis Pandis, Ryan Johnson...
Thread-level speculative execution is a technique that makes it possible for a wider range of single-threaded applications to make use of the processing resources in a chip multip...
- In a multi-core system, power and performance may be dynamically traded off by utilizing power management (PM). This paper addresses the problem of minimizing the total power con...
Mohammad Ghasemazar, Ehsan Pakbaznia, Massoud Pedr...
Abstract. This paper presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors (CMP) and its corresponding pre-layout simulation results usin...