Sciweavers

420 search results - page 71 / 84
» Scalable Parallel Programming with CUDA
Sort
View
CCGRID
2011
IEEE
13 years 13 hour ago
Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems
—One-sided communication is important to enable asynchronous communication and data movement for Global Address Space (GAS) programming models. Such communication is typically re...
Xinyu Que, Weikuan Yu, Vinod Tipparaju, Jeffrey S....
PPOPP
2009
ACM
14 years 9 months ago
A compiler-directed data prefetching scheme for chip multiprocessors
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
VLDB
2007
ACM
145views Database» more  VLDB 2007»
14 years 8 months ago
Executing Stream Joins on the Cell Processor
Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capa...
Bugra Gedik, Philip S. Yu, Rajesh Bordawekar
IPPS
2008
IEEE
14 years 2 months ago
Overcoming scaling challenges in biomolecular simulations across multiple platforms
NAMD† is a portable parallel application for biomolecular simulations. NAMD pioneered the use of hybrid spatial and force decomposition, a technique now used by most scalable pr...
Abhinav Bhatele, Sameer Kumar, Chao Mei, James C. ...
HPCC
2009
Springer
14 years 6 days ago
Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory
Abstract--Transactional Memory (TM) is an emerging technology which promises to make parallel programming easier. However, to be efficient, underlying TM system should protect only...
Sutirtha Sanyal, Sourav Roy, Adrián Cristal...