Today’s embedded applications often consist of multiple concurrent tasks. These tasks are decomposed into subtasks which are in turn assigned and scheduled on multiple different processors to achieve the optimal performance/energy combination. Previous work introduced systematical approaches to make performance-energy tradeoffs explorations for each individual task and used the exploration results at run-time to fulfill system-level constraints. However, they did not exploit the fact that the concurrent tasks can be executed in an overlapped fashion. In this paper, we propose a simple yet powerful on-line technique that performs task overlapping by run-time subtask re-scheduling. By doing so, a multiprocessor system with concurrent tasks can achieve better performance without extra energy consumption. We have applied our algorithm to a set of randomly-generated task graphs, obtaining encouraging improvements over non-overlapped task, and also having less overall energy consumption ...