Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded processors. Existing techniques deal with either scheduling hardware pipelines to obtain higher throughput or software pipeline -- an instruction scheduling technique for iterative computation -- loops for exploiting greater ILP. We integrate these techniques to co-schedule hardware and software pipelines to achieve greater instruction throughput. In this paper, we develop the underlying theory of co-scheduling, called the ModuloScheduled Pipeline (or MS-Pipeline) theory. More specifically, we establish the necessary and sufficient condition for achieving the maximum throughput in a given pipeline operating under modulo scheduling. Further, we establish a sufficient condition to achieve a specified throughput, based on which we also develop a methodology for designing the hardware pipelines that achieve such a thro...
Ramaswamy Govindarajan, Erik R. Altman, Guang R. G