Sciweavers

299 search results - page 25 / 60
» Workshop on Using Emerging Parallel Architectures for Comput...
Sort
View
CF
2010
ACM
14 years 18 days ago
Enabling a highly-scalable global address space model for petascale computing
Over the past decade, the trajectory to the petascale has been built on increased complexity and scale of the underlying parallel architectures. Meanwhile, software developers hav...
Vinod Tipparaju, Edoardo Aprà, Weikuan Yu, ...
ICS
2009
Tsinghua U.
14 years 2 months ago
Towards 100 gbit/s ethernet: multicore-based parallel communication protocol design
Ethernet line rates are projected to reach 100 Gbits/s by as soon as 2010. While in principle suitable for high performance clustered and parallel applications, Ethernet requires ...
Stavros Passas, Kostas Magoutis, Angelos Bilas
EUROPAR
2006
Springer
13 years 11 months ago
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...
OOPSLA
2005
Springer
14 years 1 months ago
X10: an object-oriented approach to non-uniform cluster computing
It is now well established that the device scaling predicted by Moore’s Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the...
Philippe Charles, Christian Grothoff, Vijay A. Sar...
IPPS
2010
IEEE
13 years 5 months ago
Offline library adaptation using automatically generated heuristics
Automatic tuning has emerged as a solution to provide high-performance libraries for fast changing, increasingly complex computer architectures. We distinguish offline adaptation (...
Frédéric de Mesmay, Yevgen Voronenko...