This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 machine faithfully enough to run a commercial operating system and arbitrary user applications. To achieve high simulation speed, Embra uses dynamic binary translation to generate code sequences which simulate the workload. It is the first machine simulator to use this technique. Embra can simulate real workloads such as multiprocess compiles and the SPEC92 benchmarks running on Silicon Graphic's IRIX 5.3 at speeds only 3 to 9 times slower than native execution of the workload, making Embra the fastest reported complete machine simulator. Dynamic binary translation also gives Embra the flexibility to dynamically control both the simulation statistics reported and the simulation model accuracy with low performance overheads. For example,...