Sciweavers

IPPS
2010
IEEE
13 years 4 months ago
Performance and energy optimization of concurrent pipelined applications
In this paper, we study the problem of finding optimal mappings for several independent but concurrent workflow applications, in order to optimize performance-related criteria tog...
Anne Benoit, Paul Renaud-Goud, Yves Robert
IPPS
2010
IEEE
13 years 4 months ago
Varying bandwidth resource allocation problem with bag constraints
We consider the problem of scheduling jobs on a pool of machines. Each job requires multiple machines on which it executes in parallel. For each job, the input specifies release ti...
Venkatesan T. Chakaravarthy, Vinayaka Pandit, Yogi...
IPPS
2010
IEEE
13 years 4 months ago
Consistency in hindsight: A fully decentralized STM algorithm
Abstract--Software transactional memory (STM) algorithms often rely on centralized components to achieve atomicity, isolation and consistency. In a distributed setting, centralized...
Annette Bieniusa, Thomas Fuhrmann
IPPS
2010
IEEE
13 years 4 months ago
Direct self-consistent field computations on GPU clusters
Guochun Shi, Volodymyr V. Kindratenko, Ivan S. Ufi...
IPPS
2010
IEEE
13 years 4 months ago
Tile QR factorization with parallel panel processing for multicore architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of...
Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, Jack D...
IPPS
2010
IEEE
13 years 4 months ago
Optimal loop unrolling for GPGPU programs
Giridhar Sreenivasa Murthy, Mahesh Ravishankar, Mu...
IPPS
2010
IEEE
13 years 4 months ago
Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs
The advent of general purpose graphics processing units (GPGPU's) brings about a whole new platform for running numerically intensive applications at high speeds. Their multi-...
Michela Taufer, Omar Padron, Philip Saponaro, Sand...
IPPS
2010
IEEE
13 years 4 months ago
Linpack evaluation on a supercomputer with heterogeneous accelerators
We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogeneous system equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD accelerators. With all of 1...
Toshio Endo, Akira Nukada, Satoshi Matsuoka, Naoya...
IPPS
2010
IEEE
13 years 4 months ago
Dynamic analysis of the relay cache-coherence protocol for distributed transactional memory
Transactional memory is an alternative programming model for managing contention in accessing shared in-memory data objects. Distributed transactional memory (TM) promises to alle...
Bo Zhang, Binoy Ravindran