Sciweavers

619 search results - page 109 / 124
» Programming Distributed Memory Sytems Using OpenMP
Sort
View
ASPLOS
2009
ACM
14 years 8 months ago
Phantom-BTB: a virtualized branch target buffer design
Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and pe...
Ioana Burcea, Andreas Moshovos
PPOPP
2010
ACM
14 years 4 months ago
Scalable communication protocols for dynamic sparse data exchange
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
ICS
2007
Tsinghua U.
14 years 1 months ago
Optimization of data prefetch helper threads with path-expression based statistical modeling
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Tor M. Aamodt, Paul Chow
VISSYM
2004
13 years 9 months ago
Isosurface Computation Made Simple
This paper presents a simple approach for rendering isosurfaces of a scalar field. Using the vertex programming capability of commodity graphics cards, we transfer the cost of com...
Valerio Pascucci
ACMMSP
2006
ACM
260views Hardware» more  ACMMSP 2006»
14 years 1 months ago
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by • time, • TLB misses, • L1 misses, • L2 m...
Michael D. Adams, David S. Wise