In this work we present an initial performance evaluation of Intel's latest, secondgeneration quad-core processor, Nehalem, and provide a comparison to first-generation AMD a...
Kevin J. Barker, Kei Davis, Adolfy Hoisie, Darren ...
The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensiv...
This paper is concerned with the scalability of large-scale grid monitoring and information services, which are mainly used for the discovery of resources of interest. Large-scale...
Fault-tolerance and lookup consistency are considered crucial properties for building applications on top of structured overlay networks. Many of these networks use the ring topol...
In this paper we argue that it is possible to couple the advantages of programming with the well-known abstraction of RPC with asynchronous programming models adequate for wide-ar...
The aim of this work is to introduce a computational costs system associated to a semantic framework for orthogonal data and control parallelism handling. In such a framework a pa...
The model of bulk-synchronous parallel computation (BSP) helps to implement portable general purpose algorithms while keeping predictable performance on different parallel compute...
Algorithmic skeletons intend to simplify parallel programming by providing a higher abstraction compared to the usual message passing. Task and data parallel skeletons can be dist...
Parallel programming has proven to be an effective technique to improve the performance of computationally intensive applications. However, writing parallel programs is not easy, ...
Roberto Di Cosmo, Zheng Li, Susanna Pelagatti, Pie...