In this paper we describe the design and implementation of a flexible, and extensible, just-in-time ARM simulator designed to run co-operatively with a multi-core DSP simulator on x86 hosts. The integrated simulator can boot ARM/Linux alongside another operating system running on DSP cores, thus truly supporting a heterogeneous multi-core operating environment. In addition, the simulator facilitates exploration of several system design parameters such as memory latencies, cache organization etc. via lightweight user-defined instrumentation. We provide performance results and highlight the impact of design choices on our overall performance and design objectives. We also discuss implementation techniques and trade-offs between the competing requirements of simulation speed versus accuracy in a complex multi-core simulation environment. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors--Code generation, Optimizations, Incremental compilers, Run-time environmen...