Pre-execution techniques have received much attention as an effective way of prefetching cache blocks to tolerate the everincreasing memory latency. A number of pre-execution tech...
Dongkeun Kim, Shih-Wei Liao, Perry H. Wang, Juan d...
In this paper, we present a technique for reducing the overhead of collecting path profiles in the context of a dynamic optimizer. The key idea to our approach, called Targeted Pa...
Efficient inter-thread value communication is essential for improving performance in Thread-Level Speculation (TLS). Although several mechanisms for improving value communication ...
Antonia Zhai, Christopher B. Colohan, J. Gregory S...
Instruction scheduling hardware can be simplified and easily pipelined if pairs of dependent instructions are fused so they share a single instruction scheduling slot. We study an...
Dynamic optimization systems store optimized or translated code in a software-managed code cache in order to maximize reuse of transformed code. Code caches store superblocks that...
Memory profiling is the process of characterizing a program's memory behavior by observing and recording its response to specific input sets. Relevant aspects of the program&...
To meet the high demand for powerful embedded processors, VLIW architectures are increasingly complex (e.g., multiple clusters), and moreover, they now run increasingly sophistica...
Dynamic binary translators use a two-phase approach to identify and optimize frequently executed code dynamically. In the first step (profiling phase), blocks of code are interpre...
Recent research has shown that programs often exhibit value locality. Such locality occurs when a code segment, although executed repeatedly in the program, takes only a small num...