Sciweavers

415 search results - page 56 / 83
» Comparative evaluation of memory models for chip multiproces...
Sort
View
HIPC
2004
Springer
14 years 1 months ago
Lock-Free Parallel Algorithms: An Experimental Study
Abstract. Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include ...
Guojing Cong, David A. Bader
PVM
2007
Springer
14 years 1 months ago
Parallelizing Dense Linear Algebra Operations with Task Queues in
llc is a language based on C where parallelism is expressed using compiler directives. The llc compiler produces MPI code which can be ported to both shared and distributed memory ...
Antonio J. Dorta, José M. Badía, Enr...
ISCA
2005
IEEE
90views Hardware» more  ISCA 2005»
14 years 1 months ago
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy while requiring fast access. Neither private nor shared caches can provide bot...
Zeshan Chishti, Michael D. Powell, T. N. Vijaykuma...
DATE
2008
IEEE
168views Hardware» more  DATE 2008»
14 years 2 months ago
Cycle-approximate Retargetable Performance Estimation at the Transaction Level
This paper presents a novel cycle-approximate performance estimation technique for automatically generated transaction level models (TLMs) for heterogeneous multicore designs. The...
Yonghyun Hwang, Samar Abdi, Daniel Gajski
MICRO
1998
IEEE
129views Hardware» more  MICRO 1998»
13 years 12 months ago
A Bandwidth-efficient Architecture for Media Processing
Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics are p...
Scott Rixner, William J. Dally, Ujval J. Kapasi, B...