While the number of transistors on a chip increases exponentially over time, the productivity that can be realized from these systems has not kept pace. To deal with the complexity of modern systems, software developers are increasingly dependent on specialized development tools such as security profilers, memory leak identifiers, data flight recorders, and dynamic type analysis. Many of these tools require full-system data which covers multiple interacting threads, processes, and processors. Reducing the performance penalty and complexity of these software tools is critical to those developing next generation applications, and many researchers have proposed adding specialized hardware to assist in profiling and introspection. Unfortunately, while this additional hardware would be incredibly beneficial to developers, the cost of this hardware must be paid on every single die that is manufactured. In this paper, we argue that a new way to attack this problem is with the addition o...