Sciweavers

643 search results - page 67 / 129
» Using Hardware Counters to Automatically Improve Memory Perf...
Sort
View
ASPLOS
2011
ACM
13 years 10 days ago
Sponge: portable stream programming on graphics engines
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL...
Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor N. ...
ISCA
2011
IEEE
324views Hardware» more  ISCA 2011»
13 years 13 days ago
Prefetch-aware shared resource management for multi-core systems
Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these share...
Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, Yale N....
ASPLOS
2011
ACM
13 years 10 days ago
RCDC: a relaxed consistency deterministic computer
Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic...
Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ce...
ISLPED
2003
ACM
91views Hardware» more  ISLPED 2003»
14 years 2 months ago
Reducing reorder buffer complexity through selective operand caching
Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures , such as the Intel P6, the ROB also serves as a repositor...
Gurhan Kucuk, Dmitry Ponomarev, Oguz Ergin, Kanad ...
ISCA
2006
IEEE
125views Hardware» more  ISCA 2006»
14 years 2 months ago
Architectural Semantics for Practical Transactional Memory
Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional stat...
Austen McDonald, JaeWoong Chung, Brian D. Carlstro...