We present an application-driven customization methodology for energy-efficient inter-core communication in embedded multiprocessors. The methodology leverages configurable cache architectures and integrates software and hardware support to achieve energyefficient data sharing between producer and consumer tasks. The technique is especially beneficial for data-streaming applications exploiting pipeline parallelism where computational phases are mapped to separate processor cores. The application-driven data cache partitioning achieves low-power and low-latency (no coherence misses) inter-core data sharing. The basic premise of the proposed technique is to separate through cache partitioning the private data from the several shared data buffers used by each producer/consumer task. Such partitioning will result in the following benefits: 1) Data cache accesses caused by the processor and the coherence mechanism will need to access only a cache partition instead of the entire cache ...