Sciweavers

64
Voted
IPPS
2010
IEEE
14 years 10 months ago
Direct self-consistent field computations on GPU clusters
Guochun Shi, Volodymyr V. Kindratenko, Ivan S. Ufi...
131
Voted
IPPS
2010
IEEE
14 years 10 months ago
Tile QR factorization with parallel panel processing for multicore architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of...
Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, Jack D...
75
Voted
IPPS
2010
IEEE
14 years 10 months ago
Optimal loop unrolling for GPGPU programs
Giridhar Sreenivasa Murthy, Mahesh Ravishankar, Mu...
IPPS
2010
IEEE
14 years 10 months ago
Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs
The advent of general purpose graphics processing units (GPGPU's) brings about a whole new platform for running numerically intensive applications at high speeds. Their multi-...
Michela Taufer, Omar Padron, Philip Saponaro, Sand...
75
Voted
IPPS
2010
IEEE
14 years 10 months ago
Linpack evaluation on a supercomputer with heterogeneous accelerators
We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogeneous system equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD accelerators. With all of 1...
Toshio Endo, Akira Nukada, Satoshi Matsuoka, Naoya...
109
Voted
IPPS
2010
IEEE
14 years 10 months ago
Dynamic analysis of the relay cache-coherence protocol for distributed transactional memory
Transactional memory is an alternative programming model for managing contention in accessing shared in-memory data objects. Distributed transactional memory (TM) promises to alle...
Bo Zhang, Binoy Ravindran
IPPS
2010
IEEE
14 years 10 months ago
Dynamic fractional resource scheduling for HPC workloads
Mark Stillwell, Frédéric Vivien, Hen...
105
Voted
IPPS
2010
IEEE
14 years 10 months ago
Fine-grained QoS scheduling for PCM-based main memory systems
With wide adoption of chip multiprocessors (CMPs) in modern computers, there is an increasing demand for large capacity main memory systems. The emerging PCM (Phase Change Memory) ...
Ping Zhou, Yu Du, Youtao Zhang, Jun Yang 0002
IPPS
2010
IEEE
14 years 10 months ago
Broadcasting on large scale heterogeneous platforms under the bounded multi-port model
We consider the problem of broadcasting a large message in a large scale distributed platform. The message must be sent from a source node, with the help of the receiving peers whi...
Olivier Beaumont, Lionel Eyraud-Dubois, Shailesh K...
IPPS
2010
IEEE
14 years 10 months ago
Performance evaluation of concurrent collections on high-performance multicore computing systems
This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her co...
Aparna Chandramowlishwaran, Kathleen Knobe, Richar...