Emerging application domains such as interactive vision, animation, and multimedia collaboration display dynamic scalable parallelism, and high computational requirements, making them good candidates for executing on parallel architectures such as SMPs or clusters of SMPs. Apart from their main algorithmic components, these applications need specialized support mechanisms that enable plumbing different modules together, cross module data transfer, automatic buffer management, synchronization and so on. Such support mechanisms are usually part of the runtime system, and in this paper we quantify their performance. The runtime for our evaluation is Stampede, a cluster programming system that is designed to meet the requirements of such applications. We have developed a cycle accurate timing infrastructure that helps tease out the time spent by an application in different layers of software, viz., the main algorithmic component, the support mechanisms, and the raw messaging. We conducted...