This paper presents the influence of the loop nest splitting source code optimization on the worst-case execution time (WCET). Loop nest splitting minimizes the number of executed...
With the increasing gap between processor speed and memory latency, the performance of data-dominated programs are becoming more reliant on fast data access, which can be improved...
Deeply pipelined high performance processors require highly accurate branch prediction to drive their instruction fetch. However there remains a class of events which are not easi...
A technique for estimating the cost of executing a loop nest in parallel (parallel start-up overhead) is described in this paper. This technique is of utmost importance for paralle...
We present Lazy Binary Splitting (LBS), a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager Binary Splitting work-stealing...
Alexandros Tzannes, George C. Caragea, Rajeev Baru...