Sciweavers

66 search results - page 5 / 14
» Thread Cluster Memory Scheduling: Exploiting Differences in ...
Sort
View
ISCA
2010
IEEE
185views Hardware» more  ISCA 2010»
14 years 19 days ago
Dynamic warp subdivision for integrated branch and memory divergence tolerance
SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget...
Jiayuan Meng, David Tarjan, Kevin Skadron
ASPDAC
2008
ACM
107views Hardware» more  ASPDAC 2008»
13 years 9 months ago
Enabling run-time memory data transfer optimizations at the system level with automated extraction of embedded software metadata
The information about the run-time behavior of software applications is crucial for enabling system level optimizations for embedded systems. This embedded Software Metadata inform...
Alexandros Bartzas, Miguel Peón Quiró...
IPPS
2010
IEEE
13 years 5 months ago
Structuring the execution of OpenMP applications for multicore architectures
Abstract--The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user frien...
François Broquedis, Olivier Aumage, Brice G...
IWOMP
2009
Springer
14 years 3 days ago
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
Abstract. Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-...
François Broquedis, Nathalie Furmento, Bric...
EUROPAR
2008
Springer
13 years 9 months ago
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...
Marc Pérache, Hervé Jourdren, Raymon...