Sciweavers

1022 search results - page 164 / 205
» Automatic data and computation decomposition on distributed ...
Sort
View
PPOPP
2006
ACM
14 years 1 months ago
Optimizing irregular shared-memory applications for distributed-memory systems
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to ...
Ayon Basumallik, Rudolf Eigenmann
IEEEPACT
2003
IEEE
14 years 28 days ago
Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation
In Thread-Level Speculation (TLS), speculative tasks generate memory state that cannot simply be combined with the rest of the system because it is unsafe. One way to deal with th...
María Jesús Garzarán, Milos P...
EUROPAR
2009
Springer
14 years 8 days ago
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs
In response to the constant increase in wire delays, Non-Uniform Cache Architecture (NUCA) has been introduced as an effective memory model for dealing with growing memory latenci...
Javier Lira, Carlos Molina, Antonio Gonzále...
IEEEPACT
2006
IEEE
14 years 1 months ago
Whole-program optimization of global variable layout
On machines with high-performance processors, the memory system continues to be a performance bottleneck. Compilers insert prefetch operations and reorder data accesses to improve...
Nathaniel McIntosh, Sandya Mannarswamy, Robert Hun...
HPCA
2011
IEEE
12 years 11 months ago
A new server I/O architecture for high speed networks
Traditional architectural designs are normally focused on CPUs and have been often decoupled from I/O considerations. They are inefficient for high-speed network processing with a...
Guangdeng Liao, Xia Znu, Laxmi N. Bhuyan