Sciweavers

280 search results - page 40 / 56
» Preliminary Results from a Parallel MATLAB Compiler
Sort
View
SC
2005
ACM
14 years 1 months ago
An Application-Based Performance Characterization of the Columbia Supercluster
Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as one of the fastest computers in the world. In this paper...
Rupak Biswas, M. Jahed Djomehri, Robert Hood, Haoq...
ISHPC
2003
Springer
14 years 25 days ago
Code and Data Transformations for Improving Shared Cache Performance on SMT Processors
Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-performance ratios. Sharing a cache between simultaneously executing threads causes excessi...
Dimitrios S. Nikolopoulos
PPOPP
2010
ACM
14 years 4 months ago
Using data structure knowledge for efficient lock generation and strong atomicity
To achieve high-performance on multicore systems, sharedmemory parallel languages must efficiently implement atomic operations. The commonly used and studied paradigms for atomici...
Gautam Upadhyaya, Samuel P. Midkiff, Vijay S. Pai
FCCM
2011
IEEE
220views VLSI» more  FCCM 2011»
12 years 11 months ago
Reducing the Energy Cost of Irregular Code Bases in Soft Processor Systems
— This paper describes an architecture and FPGA synthesis toolchain for building specialized, energy-saving coprocessors called Irregular Code Energy Reducers (ICERs) for a wide ...
Manish Arora, Jack Sampson, Nathan Goulding-Hotta,...
CISIS
2010
IEEE
14 years 2 months ago
Threaded Dynamic Memory Management in Many-Core Processors
—Current trends in desktop processor design have been toward many-core solutions with increased parallelism. As the number of supported threads grows in these processors, it may ...
Edward C. Herrmann, Philip A. Wilsey