Sciweavers

42 search results - page 8 / 9
» Memory Requirement Optimization with Loop Fusion and Loop Sh...
Sort
View
PLDI
2009
ACM
14 years 2 months ago
Binary analysis for measurement and attribution of program performance
Modern programs frequently employ sophisticated modular designs. As a result, performance problems cannot be identified from costs attributed to routines in isolation; understand...
Nathan R. Tallent, John M. Mellor-Crummey, Michael...
PDP
2008
IEEE
14 years 1 months ago
Out-of-Core Wavefront Computations with Reduced Synchronization
Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed w...
Pierre-Nicolas Clauss, Jens Gustedt, Fréd&e...
CIVR
2007
Springer
173views Image Analysis» more  CIVR 2007»
14 years 1 months ago
Fast and cheap object recognition by linear combination of views
In this paper, we present a real-time algorithm for 3D object detection in images. Our method relies on the Ullman and Basri [13] theory which claims that the same object under di...
Jérome Revaud, Guillaume Lavoué, Yas...
ASPLOS
2004
ACM
14 years 26 days ago
Programming with transactional coherence and consistency (TCC)
Transactional Coherence and Consistency (TCC) offers a way to simplify parallel programming by executing all code within transactions. In TCC systems, transactions serve as the fu...
Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben...
CF
2006
ACM
13 years 9 months ago
Intermediately executed code is the key to find refactorings that improve temporal data locality
The growing speed gap between memory and processor makes an efficient use of the cache ever more important to reach high performance. One of the most important ways to improve cac...
Kristof Beyls, Erik H. D'Hollander