Modern embedded CPU systems rely on a growing number of software features, but this growth increases the memory footprint and increases the need for efficient instruction and data caches. The embedded operating system will often juggle a changing set tasks in a round-robin fashion, which inevitably results in cache misses due to conflicts between different tasks. Our technique reduces cache misses by continuously monitoring CPU cache misses to grade the performance of running tasks. Through a series of step-wise refinements, our software system tunes the round-robin ordering to find a better temporal sequence for the tasks. This tuning is done dynamically during program execution and hence can adapt to changes in work load or external input stimulus. The benefits of this technique are illustrated using an ARM processor running application benchmarks with different cache organizations and round-robin scheduling techniques.
Ken W. Batcher, Robert A. Walker