Optimizing jobs timeouts on clusters and production grids

14 years 6 months ago

Download rainbow.polytech.unice.fr

This paper presents a method to optimize the timeout value of computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into account a proportion of outliers to model either reliable clusters or production grids characterized by faults causing jobs loss. Job management systems are ﬁrst studied considering classical distributions. Diﬀerent behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure1 . Those results show that using the optimal timeout value provided by our

Tristan Glatard, Xavier Pennec

Real-time Traffic

CCGRID 2007 | Cluster Computing | Job Execution Time | Job Management | Timeout Value |

claim paper

Post Info
More Details (n/a)

Added	02 Jun 2010
Updated	02 Jun 2010
Type	Conference
Year	2007
Where	CCGRID
Authors	Tristan Glatard, Xavier Pennec

Comments (0)

Sciweavers

Optimizing jobs timeouts on clusters and production grids

CCGRID 2007 | Cluster Computing | Job Execution Time | Job Management | Timeout Value |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers