Sciweavers

1064 search results - page 140 / 213
» Parallel Spectral Clustering in Distributed Systems
Sort
View
ICPP
2009
IEEE
14 years 3 months ago
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems
—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...
Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...
HIPC
2007
Springer
14 years 3 months ago
A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
John Paul Walters, Vipin Chaudhary
CANPC
2000
Springer
14 years 1 months ago
Transparent Network Connectivity in Dynamic Cluster Environments
Improvements in microprocessor and networking performance have made networks of workstations a very attractive platform for high-end parallel and distributed computing. However, t...
Xiaodong Fu, Hua Wang, Vijay Karamcheti
CLUSTER
2002
IEEE
13 years 8 months ago
Online Prediction of the Running Time of Tasks
Abstract. We describe and evaluate the Running Time Advisor (RTA), a system that can predict the running time of a compute-bound task on a typical shared, unreserved commodity host...
Peter A. Dinda
ICPP
2006
IEEE
14 years 3 months ago
Data Transfers between Processes in an SMP System: Performance Study and Application to MPI
— This paper focuses on the transfer of large data in SMP systems. Achieving good performance for intranode communication is critical for developing an efficient communication s...
Darius Buntinas, Guillaume Mercier, William Gropp