Sciweavers

164 search results - page 25 / 33
» GridMD: Program Architecture for Distributed Molecular Simul...
Sort
View
EUROPAR
2009
Springer
14 years 2 months ago
High Performance Matrix Multiplication on Many Cores
Moore’s Law suggests that the number of processing cores on a single chip increases exponentially. The future performance increases will be mainly extracted from thread-level par...
Nan Yuan, Yongbin Zhou, Guangming Tan, Junchao Zha...
ICPP
1998
IEEE
13 years 12 months ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang
IPPS
2000
IEEE
14 years 5 hour ago
Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisiti...
Jim Nilsson, Fredrik Dahlgren
ICCAD
2001
IEEE
272views Hardware» more  ICCAD 2001»
14 years 4 months ago
NetBench: A Benchmarking Suite for Network Processors
— In this study we introduce NetBench, a benchmarking suite for network processors. NetBench contains a total of 9 applications that are representative of commercial applications...
Gokhan Memik, William H. Mangione-Smith, Wendong H...
ICA3PP
2010
Springer
14 years 12 days ago
Accelerating Euler Equations Numerical Solver on Graphics Processing Units
Abstract. Finite volume numerical methods have been widely studied, implemented and parallelized on multiprocessor systems or on clusters. Modern graphics processing units (GPU) pr...
Pierre Kestener, Frédéric Chât...