Sciweavers

66 search results - page 5 / 14
» Thread Cluster Memory Scheduling: Exploiting Differences in ...
Sort
View
ISCA
2010
IEEE
185views Hardware» more  ISCA 2010»
15 years 7 months ago
Dynamic warp subdivision for integrated branch and memory divergence tolerance
SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget...
Jiayuan Meng, David Tarjan, Kevin Skadron
113
Voted
ASPDAC
2008
ACM
107views Hardware» more  ASPDAC 2008»
15 years 4 months ago
Enabling run-time memory data transfer optimizations at the system level with automated extraction of embedded software metadata
The information about the run-time behavior of software applications is crucial for enabling system level optimizations for embedded systems. This embedded Software Metadata inform...
Alexandros Bartzas, Miguel Peón Quiró...
155
Voted
IPPS
2010
IEEE
15 years 16 days ago
Structuring the execution of OpenMP applications for multicore architectures
Abstract--The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user frien...
François Broquedis, Olivier Aumage, Brice G...
IWOMP
2009
Springer
15 years 7 months ago
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
Abstract. Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-...
François Broquedis, Nathalie Furmento, Bric...
128
Voted
EUROPAR
2008
Springer
15 years 4 months ago
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...
Marc Pérache, Hervé Jourdren, Raymon...