Sciweavers

656 search results - page 112 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
IPPS
2006
IEEE
14 years 2 months ago
Compiler assisted dynamic management of registers for network processors
Modern network processors support high levels of parallelism in packet processing by supporting multiple threads that execute on a micro-engine. Threads switch context upon encoun...
R. Collins, Fernando Alegre, Xiaotong Zhuang, Sant...
ISPASS
2009
IEEE
14 years 3 months ago
Analyzing CUDA workloads using a detailed GPU simulator
Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manyco...
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, He...
ICDCS
2009
IEEE
14 years 5 months ago
REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems
To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of ...
Shicong Meng, Srinivas R. Kashyap, Chitra Venkatra...
IPPS
2007
IEEE
14 years 2 months ago
Load Miss Prediction - Exploiting Power Performance Trade-offs
— Modern CPUs operate at GHz frequencies, but the latencies of memory accesses are still relatively large, in the order of hundreds of cycles. Deeper cache hierarchies with large...
Konrad Malkowski, Greg M. Link, Padma Raghavan, Ma...
ICPADS
2010
IEEE
13 years 6 months ago
Data-Aware Task Scheduling on Multi-accelerator Based Platforms
To fully tap into the potential of heterogeneous machines composed of multicore processors and multiple accelerators, simple offloading approaches in which the main trunk of the ap...
Cédric Augonnet, Jérôme Clet-O...