Sciweavers

64 search results - page 4 / 13
» Efficient Parallelization of Unstructured Reductions on Shar...
Sort
View
ISCA
2002
IEEE
174views Hardware» more  ISCA 2002»
13 years 6 months ago
Efficient Task Partitioning Algorithms for Distributed Shared Memory Systems
In this paper, we consider the tree task graphs which arise from many important programming paradigms such as divide and conquer, branch and bound etc., and the linear task-graphs...
Sibabrata Ray, Hong Jiang
SPAA
2003
ACM
14 years 9 days ago
Quantifying instruction criticality for shared memory multiprocessors
Recent research on processor microarchitecture suggests using instruction criticality as a metric to guide hardware control policies. Fields et al. [3, 4] have proposed a directed...
Tong Li, Alvin R. Lebeck, Daniel J. Sorin
IPPS
2010
IEEE
13 years 5 months ago
A PRAM-NUMA model of computation for addressing low-TLP workloads
It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy par...
Martti Forsell
ISCAS
2005
IEEE
126views Hardware» more  ISCAS 2005»
14 years 20 days ago
Scheduling algorithm for partially parallel architecture of LDPC decoder by matrix permutation
— The fully parallel LDPC decoding architecture can achieve high decoding throughput, but it suffers from large hardware complexity caused by a large set of processing units and ...
In-Cheol Park, Se-Hyeon Kang
IPPS
2000
IEEE
13 years 11 months ago
Fault-Tolerant Distributed-Shared-Memory on a Broadcast-Based Interconnection Network
The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network which directly links arbitrary pairs of processor nodes wit...
Diana Hecht, Constantine Katsinis