Sciweavers

643 search results - page 98 / 129
» Using Hardware Counters to Automatically Improve Memory Perf...
Sort
View
DATE
2007
IEEE
72views Hardware» more  DATE 2007»
14 years 3 months ago
The impact of loop unrolling on controller delay in high level synthesis
Loop unrolling is a well-known compiler optimization that can lead to significant performance improvements. When used in High Level Synthesis (HLS) unrolling can affect the contr...
Srikanth Kurra, Neeraj Kumar Singh, Preeti Ranjan ...
IEEEPACT
2008
IEEE
14 years 3 months ago
Exploiting loop-dependent stream reuse for stream processors
The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory acc...
Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers,...
ASPLOS
2009
ACM
14 years 9 months ago
Understanding software approaches for GPGPU reliability
Even though graphics processors (GPUs) are becoming increasingly popular for general purpose computing, current (and likely near future) generations of GPUs do not provide hardwar...
Martin Dimitrov, Mike Mantor, Huiyang Zhou
ISCA
2008
IEEE
148views Hardware» more  ISCA 2008»
14 years 3 months ago
Atomic Vector Operations on Chip Multiprocessors
The current trend is for processors to deliver dramatic improvements in parallel performance while only modestly improving serial performance. Parallel performance is harvested th...
Sanjeev Kumar, Daehyun Kim, Mikhail Smelyanskiy, Y...
CASES
2007
ACM
14 years 25 days ago
A low power front-end for embedded processors using a block-aware instruction set
Energy, power, and area efficiency are critical design concerns for embedded processors. Much of the energy of a typical embedded processor is consumed in the front-end since inst...
Ahmad Zmily, Christos Kozyrakis