Sciweavers

2020 search results - page 17 / 404
» Scalable Instruction-Level Parallelism.
Sort
View
SPAA
2012
ACM
11 years 11 months ago
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools
We present a highly-scalable non-blocking producer-consumer task pool, designed with a special emphasis on lightweight synchronization and data locality. The core building block o...
Elad Gidron, Idit Keidar, Dmitri Perelman, Yonatha...
DAC
2002
ACM
14 years 9 months ago
Energy estimation and optimization of embedded VLIW processors based on instruction clustering
Aim of this paper is to propose a methodology for the definition of an instruction-level energy estimation framework for VLIW (Very Long Instruction Word) processors. The power mo...
Andrea Bona, Mariagiovanna Sami, Donatella Sciuto,...
IISWC
2009
IEEE
14 years 3 months ago
Understanding PARSEC performance on contemporary CMPs
PARSEC is a reference application suite used in industry and academia to assess new Chip Multiprocessor (CMP) designs. No investigation to date has profiled PARSEC on real hardwa...
Major Bhadauria, Vincent M. Weaver, Sally A. McKee
IPPS
2005
IEEE
14 years 2 months ago
Control-Flow Independence Reuse via Dynamic Vectorization
Current processors exploit out-of-order execution and branch prediction to improve instruction level parallelism. When a branch prediction is wrong, processors flush the pipeline ...
Alex Pajuelo, Antonio González, Mateo Valer...
EUC
2005
Springer
14 years 2 months ago
Optimizing Nested Loops with Iterational and Instructional Retiming
Abstract. Embedded systems have strict timing and code size requirements. Retiming is one of the most important optimization techniques to improve the execution time of loops by in...
Chun Xue, Zili Shao, Meilin Liu, Mei Kang Qiu, Edw...