Memory profiling is the process of characterizing a program's memory behavior by observing and recording its response to specific input sets. Relevant aspects of the program&...
To meet the high demand for powerful embedded processors, VLIW architectures are increasingly complex (e.g., multiple clusters), and moreover, they now run increasingly sophistica...
Dynamic binary translators use a two-phase approach to identify and optimize frequently executed code dynamically. In the first step (profiling phase), blocks of code are interpre...
Recent research has shown that programs often exhibit value locality. Such locality occurs when a code segment, although executed repeatedly in the program, takes only a small num...
Operand gating is a technique for improving processor energy efficiency by gating off sections of the data path that are unneeded by short-precision (narrow) operands. A method fo...
In this paper, we describe a generalized approach to deriving a custom data layout in multiple memory banks for array-based computations, to facilitate high-bandwidth parallel mem...
Predicated execution enables the removal of branches by converting segments of branching code into sequences of conditional operations. An important side effect of this transforma...
Mikhail Smelyanskiy, Scott A. Mahlke, Edward S. Da...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. In this paper, we propose a threestep ap...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to the outer loops. In a companion paper, we proposed a ...
Hongbo Rong, Alban Douillet, Ramaswamy Govindaraja...