Sciweavers

109 search results - page 15 / 22
» Bounds on Memory Bandwidth in Streamed Computations
Sort
View
IPPS
2010
IEEE
13 years 5 months ago
Optimization of linked list prefix computations on multithreaded GPUs using CUDA
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
Zheng Wei, Joseph JáJá
ICS
2004
Tsinghua U.
14 years 28 days ago
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and...
Ravi R. Iyer
HPCA
2009
IEEE
14 years 8 months ago
Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems
Linked data structure (LDS) accesses are critical to the performance of many large scale applications. Techniques have been proposed to prefetch such accesses. Unfortunately, many...
Eiman Ebrahimi, Onur Mutlu, Yale N. Patt
STOC
2010
ACM
199views Algorithms» more  STOC 2010»
14 years 13 days ago
Zero-One Frequency Laws
Data streams emerged as a critical model for multiple applications that handle vast amounts of data. One of the most influential and celebrated papers in streaming is the “AMSâ...
Vladimir Braverman and Rafail Ostrovsky
ISCA
2002
IEEE
174views Hardware» more  ISCA 2002»
13 years 7 months ago
Efficient Task Partitioning Algorithms for Distributed Shared Memory Systems
In this paper, we consider the tree task graphs which arise from many important programming paradigms such as divide and conquer, branch and bound etc., and the linear task-graphs...
Sibabrata Ray, Hong Jiang