With the growing number of programmable processing elements in today's MultiProcessor System-on-Chip (MPSoC) designs, the synergy required for the development of the hardware architecture and the software running on them is also increasing. In MPSoC development environment, changes in the hardware architecture can bring in extensive re-partitioning or re-parallelization of the software architecture. Fast and accurate functional simulation and performance estimation techniques are needed to cope with this co-design problem at the early phases of MPSoC design space exploration. The current paper addresses this issue by introducing a framework which combines hybrid simulation, cache simulation and online trace-driven replay techniques to accurately predict performance of programmable elements in an MPSoC environment. The resulting simulation technique can easily cope with the continuous re-organizations of software architectures during an Instruction Set Simulator (ISS) based design...