Sciweavers

PPOPP
2010
ACM
14 years 8 months ago
Scaling LAPACK panel operations using parallel cache assignment
In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
Anthony M. Castaldo, R. Clint Whaley
PPOPP
2010
ACM
14 years 8 months ago
Leveraging parallel nesting in transactional memory
Exploiting the emerging reality of affordable multi-core architeces through providing programmers with simple abstractions that would enable them to easily turn their sequential p...
João Barreto, Aleksandar Dragojevic, Paulo ...
PPOPP
2010
ACM
14 years 8 months ago
Modeling advanced collective communication algorithms on cell-based systems
This paper presents and validates performance models for a variety of high-performance collective communication algorithms for systems with Cell processors. The systems modeled in...
Qasim Ali, Samuel P. Midkiff, Vijay S. Pai
PPOPP
2010
ACM
14 years 8 months ago
Application heartbeats for software performance and health
Adaptive, or self-aware, computing has been proposed to help application programmers confront the growing complexity of multicore software development. However, existing approache...
Henry Hoffmann, Jonathan Eastep, Marco D. Santambr...
PPOPP
2010
ACM
14 years 8 months ago
Modeling transactional memory workload performance
Transactional memory promises to make parallel programming easier than with fine-grained locking, while performing just as well. This performance claim is not always borne out bec...
Donald E. Porter, Emmett Witchel
PPOPP
2010
ACM
14 years 8 months ago
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
We present Lazy Binary Splitting (LBS), a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager Binary Splitting work-stealing...
Alexandros Tzannes, George C. Caragea, Rajeev Baru...
PPOPP
2010
ACM
14 years 8 months ago
Scalable communication protocols for dynamic sparse data exchange
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
PPOPP
2010
ACM
14 years 8 months ago
GAMBIT: effective unit testing for concurrency libraries
As concurrent programming becomes prevalent, software providers are investing in concurrency libraries to improve programmer productivity. Concurrency libraries improve productivi...
Katherine E. Coons, Sebastian Burckhardt, Madanlal...
PPOPP
2010
ACM
14 years 8 months ago
Debugging programs that use atomic blocks and transactional memory
Ferad Zyulkyarov, Tim Harris, Osman S. Unsal, Adri...
PPOPP
2010
ACM
14 years 8 months ago
NOrec: streamlining STM by abolishing ownership records
Drawing inspiration from several previous projects, we present an ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusua...
Luke Dalessandro, Michael F. Spear, Michael L. Sco...