Sciweavers

733 search results - page 53 / 147
» High performance in tree-based parallel architectures
Sort
View
IPPS
1998
IEEE
14 years 1 months ago
Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split ...
Seungjo Bae, Dongmin Kim, Sanjay Ranka
EUROPAR
2001
Springer
14 years 1 months ago
VIA Communication Performance on a Gigabit Ethernet Cluster
As the technology for high-speed networks has evolved over the last decade, the interconnection of commodity computers (e.g., PCs and workstations) at gigabit rates has become a re...
Mark Baker, Paul A. Farrell, Hong Ong, Stephen L. ...
ASPLOS
1998
ACM
14 years 1 months ago
Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine
Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocesso...
Walter Lee, Rajeev Barua, Matthew Frank, Devabhakt...
IPPS
2006
IEEE
14 years 2 months ago
Performance Analysis of the Reactor Pattern in Network Services
The growing reliance on services provided by software applications places a high premium on the reliable and efficient operation of these applications. A number of these applicat...
Swapna S. Gokhale, Aniruddha S. Gokhale, Jeffrey G...
ICPP
2008
IEEE
14 years 3 months ago
Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches
This paper presents a two-part study on managing distributed NUCA (Non-Uniform Cache Architecture) L2 caches in a future manycore processor to obtain high singlethread program per...
Lei Jin, Sangyeun Cho