The implementation of bounded-delay services over integrated services networks relies admission control mechanisms that in turn use end-to-end delay computation algorithms. For gu...
In past massively parallel processing systems, such as the TMC CM-5 and the CRI T3E, the scheduling problem consisted of allocating a single type of resource among the waiting job...
The probability of failures in software distributed shared memory (SDSM) increases as the system size grows. This paper introduces a new, efficient message logging technique, call...
This paper describes a technique called producer-push that enhances the performance of a page-based software distributed shared memory system. Shared data, in software DSM systems...
This paper describes transformation techniques for out-of-core programs (i.e., those that deal with very large quantities of data) based on exploiting locality using a combination...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, th...
We study the problem of exploiting parallelism from search-based AI systems on distributed machines. We propose stack-splitting, a technique for implementing orparallelism, which ...
This paper describes the challenges presented by singlechip parallel media processors (PMPs). These machines integrate multiple parallel function units, instruction execution, and...