Sciweavers

1022 search results - page 131 / 205
» Automatic data and computation decomposition on distributed ...
Sort
View
PPOPP
1990
ACM
13 years 11 months ago
Employing Register Channels for the Exploitation of Instruction Level Parallelism
Abstract - A multiprocessor system capable of exploiting fine-grained parallelism must support efficient synchronization and data passing mechanisms. This paper demonstrates the us...
Rajiv Gupta
EUROPAR
2008
Springer
13 years 9 months ago
Efficiently Building the Gated Single Assignment Form in Codes with Pointers in Modern Optimizing Compilers
Abstract. Understanding program behavior is at the foundation of program optimization. Techniques for automatic recognition of program constructs characterize the behavior of code ...
Manuel Arenaz, Pedro Amoedo, Juan Touriño
HOTI
2005
IEEE
14 years 1 months ago
Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch
— The amount of memory in buffered crossbars in combined input-crosspoint buffered switches is proportional to the number of crosspoints, or O(N2 ), where N is the number of port...
Ziqian Dong, Roberto Rojas-Cessa
IEEEPACT
2006
IEEE
14 years 1 months ago
Overlapping dependent loads with addressless preload
Modern out-of-order processors with non-blocking caches exploit Memory-Level Parallelism (MLP) by overlapping cache misses in a wide instruction window. The exploitation of MLP, h...
Zhen Yang, Xudong Shi, Feiqi Su, Jih-Kwon Peir
HPCA
2006
IEEE
14 years 8 months ago
LogTM: log-based transactional memory
Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes p...
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan...