Application run-time information is a fundamental component in application and job scheduling. However, accurate predictions of run times are difficult to achieve for parallel app...
Large scale bioinformatics experiments are usually composed by a set of data flows generated by a chain of activities (programs or services) that may be modeled as scientific work...
We propose a new fault localization technique for software bugs in large-scale computing systems. Our technique always collects per-process function call traces of a target system...
We present Grouped Distributed Queues (GDQ), the first proportional share scheduler for multiprocessor systems that scales well with a large number of processors and processes. G...
The DAG-based task graph model has been found effective in scheduling for performance prediction and optimization of parallel applications. However the scheduling complexity and s...