Dependencies between iterations of loop structures cannot always be determined at compile-time because they may depend on input data which is known only at run-time. A prime examp...
V. Prasad Krothapalli, Thulasiraman Jeyaraman, Mar...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. In this paper, we propose a threestep ap...
The transition to multithreaded, multi-core designs places a greater responsibility on programmers and software for improving performance; thread-level parallelism (TLP) will be i...
Tipp Moseley, Daniel A. Connors, Dirk Grunwald, Ra...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
Both inherently sequential code and limitations of analysis techniques prevent full parallelization of many applications by parallelizing compilers. Amdahl's Law tells us tha...
Most of today's multiprocessors have a DistributedShared Memory (DSM) organization, which enables scalability while retaining the convenience of the shared-memory programming...
Angeles G. Navarro, Rafael Asenjo, Emilio L. Zapat...
This paper describes transformation techniques for out-of-core programs (i.e., those that deal with very large quantities of data) based on exploiting locality using a combination...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
This paper presents a technique, called loop dissevering, to temporally partitioning any type of loop presented in programming languages. The technique can be used in the presence...
Abstract. A loop with irregular assignment computations contains loopcarried output data dependences that can only be detected at run-time. In this paper, a load-balanced method ba...