Sciweavers

643 search results - page 63 / 129
» Using Hardware Counters to Automatically Improve Memory Perf...
Sort
View
CLUSTER
2007
IEEE
14 years 20 days ago
Efficient asynchronous memory copy operations on multi-core systems and I/OAT
Bulk memory copies incur large overheads such as CPU stalling (i.e., no overlap of computation with memory copy operation), small register-size data movement, cache pollution, etc...
Karthikeyan Vaidyanathan, Lei Chai, Wei Huang, Dha...
ICPP
2003
IEEE
14 years 2 months ago
Enabling Partial Cache Line Prefetching Through Data Compression
Hardware prefetching is a simple and effective technique for hiding cache miss latency and thus improving the overall performance. However, it comes with addition of prefetch buff...
Youtao Zhang, Rajiv Gupta
EXPCS
2007
14 years 18 days ago
Analysis of input-dependent program behavior using active profiling
Utility programs, which perform similar and largely independent operations on a sequence of inputs, include such common applications as compilers, interpreters, and document parse...
Xipeng Shen, Michael L. Scott, Chengliang Zhang, S...
ISCA
1997
IEEE
137views Hardware» more  ISCA 1997»
14 years 28 days ago
A Language for Describing Predictors and Its Application to Automatic Synthesis
As processor architectures have increased their reliance on speculative execution to improve performance, the importance of accurate prediction of what to execute speculatively ha...
Joel S. Emer, Nicholas C. Gloy
DATE
2008
IEEE
171views Hardware» more  DATE 2008»
14 years 3 months ago
Cache Aware Mapping of Streaming Applications on a Multiprocessor System-on-Chip
Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor systemon-chip. An external memory that is shared between processors is a bottl...
Arno Moonen, Marco Bekooij, Rene van den Berg, Jef...