Sciweavers

37 search results - page 6 / 8
» Application Acceleration with the Explicitly Parallel Operat...
Sort
View
HPCA
2011
IEEE
13 years 8 days ago
HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor
Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support finegrained paralle...
Sanghoon Lee, Devesh Tiwari, Yan Solihin, James Tu...
SIAMCOMP
2010
109views more  SIAMCOMP 2010»
13 years 3 months ago
Analysis of Delays Caused by Local Synchronization
Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especia...
Julia Lipman, Quentin F. Stout
ISPASS
2007
IEEE
14 years 2 months ago
CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications
This paper proposes a specialized memory structure called CA-RAM (Content Addressable Random Access Memory) to accelerate search operations present in many important real-world ap...
Sangyeun Cho, Joel R. Martin, Ruibin Xu, Mohammad ...
EUROSYS
2010
ACM
14 years 5 months ago
A Comprehensive Scheduler for Asymmetric Multicore Systems
Symmetric-ISA (instruction set architecture) asymmetricperformance multicore processors were shown to deliver higher performance per watt and area for codes with diverse architect...
Juan Carlos Saez, Manuel Prieto Matias, Alexandra ...
ASPLOS
2009
ACM
14 years 9 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards