Present application specific embedded systems tend to choose instruction set extensions (ISEs) based on limitations imposed by the available data bandwidth to custom functional units (CFUs). Adoption of the optimal ISE for an application would, in many cases, impose formidable cost increase in order to achieve the required data bandwidth. In this paper we propose a novel methodology for laying out data in memories, generating highbandwidth memory systems by making use of existing lowbandwidth low-cost ones and designing custom functional units all with the desirable data bandwidth for only a fraction of the additional cost required by traditional techniques. Categories and Subject Descriptors B.2.4 [Arithmetic and Logic Structures]: Algorithms, Cost/performance; B.3.2 [Memory Structures]: Design styles; B.5.2 [Register-Transfer-Level Implementation]: Design, Arithmetic and Logic units, Control design, Data-path design, Memory design General Terms Algorithms, Performance, Design. Keywo...