Sciweavers

482 search results - page 22 / 97
» A large-scale study of failures in high-performance computin...
Sort
View
SKG
2005
IEEE
15 years 9 months ago
An Agent-based Peer-to-Peer Grid Computing Architecture
The conventional computing Grid has developed a service oriented computing architecture with a superlocal resource management and scheduling strategy. This architecture is limited...
Jia Tang, Minjie Zhang
141
Voted
CCGRID
2003
IEEE
15 years 8 months ago
Fault Tolerance in Scalable Agent Support Systems: Integrating DARX in the AgentScape Framework
Open multi-agent systems need to cope with the characteristics of the Internet, e.g., dynamic availability of computational resources, latency, and diversity of services. Large-sc...
Benno J. Overeinder, Frances M. T. Brazier, Olivie...
117
Voted
CCGRID
2010
IEEE
15 years 4 months ago
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems
With the increasing functionality and complexity of distributed systems, resource failures are inevitable. While numerous models and algorithms for dealing with failures exist, th...
Derrick Kondo, Bahman Javadi, Alexandru Iosup, Dic...
IWCC
1999
IEEE
15 years 7 months ago
Design and Analysis of the Alliance/University of New Mexico Roadrunner Linux SMP SuperCluster
This paper will discuss high performance clustering from a series of critical topics: architectural design, system software infrastructure, and programming environment. This will ...
David A. Bader, Arthur B. Maccabe, Jason R. Mastal...
SPE
2010
114views more  SPE 2010»
15 years 1 months ago
A survey of the research on power management techniques for high-performance systems
This paper surveys the research on power management techniques for high performance systems. These include both commercial high performance clusters and scientific high performanc...
Yongpeng Liu, Hong Zhu