Sciweavers

403 search results - page 56 / 81
» On Using Incremental Profiling for the Performance Analysis ...
Sort
View
EUROPAR
2006
Springer
14 years 11 days ago
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...
ISCA
1996
IEEE
103views Hardware» more  ISCA 1996»
14 years 26 days ago
Evaluation of Design Alternatives for a Multiprocessor Microprocessor
In the future, advanced integrated circuit processing and packaging technology will allow for several design options for multiprocessor microprocessors. In this paper we consider ...
Basem A. Nayfeh, Lance Hammond, Kunle Olukotun
IPPS
2006
IEEE
14 years 2 months ago
Dual-layered file cache on cc-NUMA system
CC-NUMA is a widely adopted and deployed architecture of high performance computers. These machines are attractive for their transparent access to local and remote memory. However...
Zhou Yingchao, Meng Dan, Ma Jie
FCCM
2002
IEEE
171views VLSI» more  FCCM 2002»
14 years 1 months ago
Coarse-Grain Pipelining on Multiple FPGA Architectures
Reconfigurable systems, and in particular, FPGA-based custom computing machines, offer a unique opportunity to define application-specific architectures. These architectures offer...
Heidi E. Ziegler, Byoungro So, Mary W. Hall, Pedro...
ISPASS
2010
IEEE
14 years 3 months ago
Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads
Abstract—We generate and provide miniature synthetic benchmark clones for modern workloads to solve two pre-silicon design challenges, namely: 1) huge simulation time (weeks to m...
Karthik Ganesan, Jungho Jo, Lizy K. John