Sciweavers

204 search results - page 9 / 41
» Fault-tolerant solutions for a MPI compute intensive applica...
Sort
View
DFT
2000
IEEE
119views VLSI» more  DFT 2000»
13 years 12 months ago
An Experimental Evaluation of the Effectiveness of Automatic Rule-Based Transformations for Safety-Critical Applications
1 Over the last years, an increasing number of safety-critical tasks have been demanded to computer systems. In particular, safety-critical computer-based applications are hitting ...
Maurizio Rebaudengo, Matteo Sonza Reorda, Marco To...
CISIS
2010
IEEE
14 years 2 months ago
Computational Grid as an Appropriate Infrastructure for Ultra Large Scale Software Intensive Systems
—Ultra large scale (ULS) systems are future software intensive systems that have billions of lines of code, composed of heterogeneous, changing, inconsistent and independent elem...
Babak Rezaei Rad, Fereidoon Shams Aliee
HPDC
2010
IEEE
13 years 8 months ago
ROARS: a scalable repository for data intensive scientific computing
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide b...
Hoang Bui, Peter Bui, Patrick J. Flynn, Douglas Th...
HPDC
2009
IEEE
14 years 2 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
ICA3PP
2010
Springer
13 years 7 months ago
Checkpointing and Migration of Communication Channels in Heterogeneous Grid Environments
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...
John Mehnert-Spahn, Michael Schoettner