Sciweavers

656 search results - page 64 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
CCGRID
2008
IEEE
14 years 2 months ago
Optimized Distributed Data Sharing Substrate in Multi-core Commodity Clusters: A Comprehensive Study with Applications
Distributed applications tend to have a complex design due to issues such as concurrency, synchronization and communication. Researchers in the past have proposed abstractions to ...
Karthikeyan Vaidyanathan, Ping Lai, Sundeep Narrav...
IPPS
2006
IEEE
14 years 1 months ago
GPU-ABiSort: optimal parallel sorting on stream architectures
In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p strea...
Alexander Greß, Gabriel Zachmann
HPCA
2001
IEEE
14 years 8 months ago
A New Scalable Directory Architecture for Large-Scale Multiprocessors
The memory overhead introduced by directories constitutes a major hurdle in the scalability of cc-NUMA architectures, which makes the shared-memory paradigm unfeasible for very la...
Manuel E. Acacio, José González, Jos...
ICA3PP
2007
Springer
13 years 9 months ago
The Thread Migration Mechanism of DSM-PEPE
In this paper we present the thread migration mechanism of DSM-PEPE, a multithreaded distributed shared memory system. DSM systems like DSM-PEPE provide a parallel environment to h...
Federico Meza, Cristian Ruz
PPOPP
2012
ACM
12 years 3 months ago
PARRAY: a unifying array representation for heterogeneous parallelism
This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU c...
Yifeng Chen, Xiang Cui, Hong Mei