With the shrinking of transistors continuing to follow Moore's Law and the non-scalability of conventional outof-order processors, multi-core systems are becoming the design choice for industry. Performance extraction is thus largely alleviated from the hardware and placed on the programmer/compiler camp, who now have to expose Thread Level Parallelism (TLP) to the underlying system in the form of explicitly parallel applications. Unfortunately, parallel programming is hard and errorprone. The programmer has to parallelize the work, perform the data placement, and deal with thread synchronization. Systems that support speculative multithreaded execution like Thread Level Speculation (TLS), offer an interesting alternative since they relieve the programmer from the burden of parallelizing applications and correctly synchronizing them. Since systems that support speculative multithreading usually treat all threads equally, they are energy-inefficient. This inefficiency stems from t...