Sciweavers

1001 search results - page 54 / 201
» Improving memory hierarchy performance for irregular applica...
Sort
View
HIPC
2009
Springer
15 years 3 months ago
Optimizing the use of GPU memory in applications with large data sets
Abstract--With General Purpose programmable GPUs becoming more and more popular, automated tools are needed to bridge the gap between achievable performance from highly parallel ar...
Nadathur Satish, Narayanan Sundaram, Kurt Keutzer
ICPP
2003
IEEE
15 years 11 months ago
Performance and Power Impact of Issue-width in Chip-Multiprocessor Cores
In chip-multiprocessors (CMPs), the number of cores and the issue width of each core presents an important design trade-off to balance the amount of TLP and ILP between multi-thre...
Magnus Ekman, Per Stenström
ASPLOS
2011
ACM
14 years 9 months ago
NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories
nt, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted ...
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laur...
ICONIP
2008
15 years 7 months ago
Improvement of Practical Recurrent Learning Method and Application to a Pattern Classification Task
Practical Recurrent Learning (PRL) has been proposed as a simple learning algorithm for recurrent neural networks[1][2]. This algorithm enables learning with practical order O(n2 )...
Mohamad Faizal Bin Samsudin, Katsunari Shibata
ICPP
2003
IEEE
15 years 11 months ago
Enabling Partial Cache Line Prefetching Through Data Compression
Hardware prefetching is a simple and effective technique for hiding cache miss latency and thus improving the overall performance. However, it comes with addition of prefetch buff...
Youtao Zhang, Rajiv Gupta