It is now common for multimedia applications to be partitioned and mapped onto multiple processing elements of a system-on-chip architecture. An important design constraint in such architectures is that the FIFO buffers connecting the processing elements (in a pipelined fashion) should not overflow and the playout buffer should never underflow. To meet these constraints, an usual design practice is to increase the initial playout delay after which the output device starts reading from the playout buffer. Although implementing this technique is straightforward and involves only the the computation of an appropriate playout delay, it suffers from the downside of a large playout buffer being required. In this paper, instead of associating the playout delay solely with the output device, we propose to redistribute this delay among all the processing elements running the various tasks of the multimedia application. We show that this delay redistribution technique can signficantly reduce (u...