Sciweavers

165 search results - page 32 / 33
» Thread Cluster Memory Scheduling
Sort
View
PPOPP
2011
ACM
13 years 19 days ago
GRace: a low-overhead mechanism for detecting data races in GPU programs
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel program...
Mai Zheng, Vignesh T. Ravi, Feng Qin, Gagan Agrawa...
CGO
2008
IEEE
14 years 4 months ago
Latency-tolerant software pipelining in a production compiler
In this paper we investigate the benefit of scheduling non-critical loads for a higher latency during software pipelining. "Noncritical" denotes those loads that have s...
Sebastian Winkel, Rakesh Krishnaiyer, Robyn Sampso...
KDD
2009
ACM
198views Data Mining» more  KDD 2009»
14 years 10 months ago
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
ISCA
2008
IEEE
136views Hardware» more  ISCA 2008»
13 years 9 months ago
The Design and Performance of a Bare PC Web Server
There is an increasing need for new Web server architectures that are application-centric, simple, small, and pervasive in nature. In this paper, we present a novel architecture f...
Long He, Ramesh K. Karne, Alexander L. Wijesinha
PDCAT
2009
Springer
14 years 4 months ago
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Abstract—In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not supp...
Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu,...