Supercomputer performance is highly dependent on its interconnection subsystem design. In this paper we study how di erent architectural approaches for router design impact into s...
Hydra is a chip multiprocessor (CMP) with integrated support for thread-level speculation. Thread-level speculation provides a way to parallelize sequential programs without the n...
Current high-end parallel systems achieve low-latency, highbandwidth network communication through the use of aggressive design techniques and expensive mechanical and electrical ...
In this paper we present a processor microarchitecture that can simultaneously execute multiple threads and has a clustered design for scalability purposes. A main feature of the ...
This paper presents a mathematical framework to exploit the semantic properties of matrix operations in loop-based numerical codes. The heart of this framework is an algebraic lan...
The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically underutilize multi-...
John M. Mellor-Crummey, David B. Whalley, Ken Kenn...
In this paper we examine how application performance scales on a state-of-the-art shared virtual memory (SVM) system on a cluster with 64 processors, comprising 4-way SMPs connect...
Dongming Jiang, Brian O'Kelley, Xiang Yu, Sanjeev ...
On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring ...
Induprakas Kodukula, Keshav Pingali, Robert Cox, D...
A consistency protocol can be termed symmetric if all processors are treated identically when they access common resources. By contrast, asymmetric protocols usually assign a home...