Sciweavers

481 search results - page 62 / 97
» Performance Modeling and Measurement of Parallelized Code fo...
Sort
View
IEEEPACT
2002
IEEE
14 years 1 months ago
A Framework for Parallelizing Load/Stores on Embedded Processors
Many modern embedded processors (esp. DSPs) support partitioned memory banks (also called X-Y memory or dual bank memory) along with parallel load/store instructions to achieve co...
Xiaotong Zhuang, Santosh Pande, John S. Greenland ...
ICS
1999
Tsinghua U.
14 years 1 months ago
Realizing the performance potential of the virtual interface architecture
The Virtual Interface (VI) Architecture provides protected userlevel communication with high delivered bandwidth and low permessage latency, particularly for small messages. The V...
Evan Speight, Hazim Abdel-Shafi, John K. Bennett
IEEEPACT
2008
IEEE
14 years 3 months ago
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Moore’s Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneo...
Henry Wong, Anne Bracy, Ethan Schuchman, Tor M. Aa...
LCPC
2007
Springer
14 years 2 months ago
Multidimensional Blocking in UPC
Abstract. Partitioned Global Address Space (PGAS) languages offer an attractive, high-productivity programming model for programming large-scale parallel machines. PGAS languages, ...
Christopher Barton, Calin Cascaval, George Alm&aac...
PDP
2010
IEEE
14 years 3 months ago
Lessons Learnt Porting Parallelisation Techniques for Irregular Codes to NUMA Systems
—This work presents a study undertaken to characterise the behaviour of some parallelisation techniques for irregular codes, previously developed for SMP architectures, on a seve...
Juan Angel Lorenzo, Juan Carlos Pichel, David LaFr...