For embedded system development, several companies provide cross-platform development tools to aid in debugging, prototyping and optimization of programs. These are full system emulation systems that can emulate the final binary to be run on the real board, its operating system and devices. Many of these emulation systems do not provide cycle level information due to the time consuming nature of cycle accurate simulation. In this paper we propose a method to provide Cycle-Close Traces of cycle-level statistics for the complete execution of the program in orders of magnitude less time than performing full cycle accurate simulation, with an average error of 3.2%. Our approach uses dynamic phase analysis to generate targeted cycle-close simulation samples. Detailed simulation results for these samples are used to produce fast cycleclose traces during a program’s execution, so the user can also watch, pause and debug the currently executing code and its corresponding architecture perfo...