Sciweavers

64 search results - page 5 / 13
» Efficient Parallelization of Unstructured Reductions on Shar...
Sort
View
IPPS
1992
IEEE
13 years 11 months ago
CCHIME: A Cache Coherent Hybrid Interconnected Memory Extension
This paper presents a hybrid shared memory architecture which combines the scalability of a multistage interconnection network with the contention reduction benefits of coherent c...
Matthew K. Farrens, Arvin Park, Allison Woodruff
FPGA
2010
ACM
181views FPGA» more  FPGA 2010»
13 years 10 months ago
Efficient multi-ported memories for FPGAs
Multi-ported memories are challenging to implement with FPGAs since the provided block RAMs typically have only two ports. We present a thorough exploration of the design space of...
Charles Eric LaForest, J. Gregory Steffan
EUROPAR
2008
Springer
13 years 8 months ago
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...
Marc Pérache, Hervé Jourdren, Raymon...
EUROPAR
2010
Springer
13 years 7 months ago
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations
Today, it is possible to associate multiple CPUs and multiple GPUs in a single shared memory architecture. Using these resources efficiently in a seamless way is a challenging issu...
Everton Hermann, Bruno Raffin, François Fau...
MICRO
2000
IEEE
137views Hardware» more  MICRO 2000»
13 years 11 months ago
Relational profiling: enabling thread-level parallelism in virtual machines
Virtual machine service threads can perform many tasks in parallel with program execution such as garbage collection, dynamic compilation, and profile collection and analysis. Har...
Timothy H. Heil, James E. Smith