Sciweavers

2016 search results - page 253 / 404
» Distributed error confinement
Sort
View
MIDDLEWARE
2007
Springer
14 years 3 months ago
Using checkpointing to recover from poor multi-site parallel job scheduling decisions
Recent research in multi-site parallel job scheduling leverages user-provided estimates of job communication characteristics to effectively partition the job across multiple clus...
William M. Jones
DATE
2006
IEEE
153views Hardware» more  DATE 2006»
14 years 3 months ago
Analyzing timing uncertainty in mesh-based clock architectures
Mesh architectures are used to distribute critical global signals on a chip, such as clock and power/ground. Redundancy created by mesh loops smooths out undesirable variations be...
Subodh M. Reddy, Gustavo R. Wilke, Rajeev Murgai
GLOBECOM
2006
IEEE
14 years 3 months ago
Service Differentiation by a Link Layer Protocol Based on SR ARQ over a Satellite Channel
— This paper studies the case where multiple IP flows are aggregated over a single satellite channel and an error recovery by retransmissions is performed by SelectiveRepeat (SR...
Toshihiro Shikama, Takashi Watanabe, Tadanori Mizu...
HPDC
2006
IEEE
14 years 3 months ago
ALPS: An Application-Level Proportional-Share Scheduler
ALPS is a per-application user-level proportional-share scheduler that operates with low overhead and without any special kernel support. ALPS is useful to a range of applications...
Travis Newhouse, Joseph Pasquale
ICPP
2006
IEEE
14 years 3 months ago
A Performance Model of the Krak Hydrodynamics Application
We present an analytic performance model of a largescale hydrodynamics code developed at Los Alamos National Laboratory. This modeling work is part of an ongoing effort to develop...
Kevin J. Barker, Scott Pakin, Darren J. Kerbyson