Sciweavers

1431 search results - page 252 / 287
» Analytical Performance Models of Parallel Programs in Cluste...
Sort
View
IPPS
2010
IEEE
13 years 6 months ago
Inter-block GPU communication via fast barrier synchronization
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Shucai Xiao, Wu-chun Feng
ICS
2003
Tsinghua U.
14 years 1 months ago
Roccom: an object-oriented, data-centric software integration framework for multiphysics simulations
We describe an object-oriented software integration frameccom, abstracted from our five years of experience in developing a complex, integrated code for rocket simulation. Roccom...
Xiangmin Jiao, Michael T. Campbell, Michael T. Hea...
ISCA
1998
IEEE
151views Hardware» more  ISCA 1998»
14 years 28 days ago
Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture
Explicitly Parallel Instruction Computing (EPIC) architectures require the compiler to express program instruction level parallelism directly to the hardware. EPIC techniques whic...
David I. August, Daniel A. Connors, Scott A. Mahlk...
EUROPAR
2009
Springer
14 years 17 days ago
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades
The Cell Broadband Engine (Cell BE) is a heterogeneous multi-core processor specifically designed to exploit thread-level parallelism. Its memory model comprehends a common shared ...
Epifanio Gaona, Juan Fernández, Manuel E. A...
HPDC
2010
IEEE
13 years 9 months ago
Multi-GPU volume rendering using MapReduce
In this paper we present a multi-GPU parallel volume rendering implemention built using the MapReduce programming model. We give implementation details of the library, including s...
Jeff A. Stuart, Cheng-Kai Chen, Kwan-Liu Ma, John ...