This paper presents the Dynamic Simultaneous Multithreaded Architecture (DSMT). DSMT efficiently executes multiple threads from a single program on a SMT processor core. To accomplish this, threads are generated dynamically from a predictable flow of control and then executed speculatively. Data obtained during the single context nonspeculative execution phase of DSMT is used as a hint to speculate the posterior behavior of multiple threads. DSMT employs simple mechanisms based on state bits that keep track of inter-thread dependencies in registers and memory, synchronize thread execution, and control recovery from misspeculation. Moreover, DSMT utilizes a novel greedy policy for choosing those sections of code which provide the highest performance based on their past execution history. The DSMT architecture was simulated with a new cycle-accurate, execution-driven simulator. Our simulation results show that DSMT has very good potential to improve SMT performance, even when only a sin...