Abstract--On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When finegranularity of data communication and synchronization is requ...
Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir,...
Abstract—Transactional memory promises to generalize transactional programming to mainstream languages and data structures. The purported benefit of transactions is that they ar...
Binary instrumentation facilitates the insertion of additional code into an executable in order to observe or modify the executable's behavior. There are two main approaches t...
Michael Laurenzano, Mustafa M. Tikir, Laura Carrin...
— High-performance sockets implementations such as the Sockets Direct Protocol (SDP) have traditionally showed major performance advantages compared to the TCP/IP stack over Infi...
After many years of prefetching research, most commercially available systems support only two types of prefetching: software-directed prefetching and hardware-based prefetchers u...
Continuous circuit miniaturization and increased process variability point to a future with diminishing returns from dynamic voltage scaling. Operation below Vcc-min has been prop...
—Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-le...
—In the past ten years, computer architecture has seen a paradigm shift from emphasizing single thread performance to energy efficient, throughput oriented, chip multiprocessors...
David Nellans, Kshitij Sudan, Rajeev Balasubramoni...
—Graphics processors (GPU) offer the promise of more than an order of magnitude speedup over conventional processors for certain non-graphics computations. Because the ften prese...
Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadoo...