This paper presents a novel cycle-approximate performance estimation technique for automatically generated transaction level models (TLMs) for heterogeneous multicore designs. The inputs are application C processes and their mapping to processing units in the platform. The processing unit model consists of pipelined datapath, memory hierarchy and branch delay model. Using the processing unit model, the basic blocks in the C processes are analyzed and annotated with estimated delays. This is followed by a code generation phase where delay-annotated C code is generated and linked with a SystemC wrapper consisting of inter-process communication channels. The generated TLM is compiled and executed natively on the host machine. Our key contribution is that the estimation technique is close to cycle-accurate, it can be applied to any multi-core platform and it produces high-speed native compiled TLMs. For experiments, timed TLMs for industrial scale designs such as MP3 decoder were automati...