As the complexity of embedded systems grows rapidly, it is common to accelerate critical tasks with hardware. Designers usually use off-the-shelf components or licensed IP cores to shorten the time to market, but the hardware/software interfacing is tedious, error-prone and usually not portable. Besides, the existing hardware seldom matches the requirements perfectly. CASCADE, the proposed design environment as an alternative, generates coprocessing datapaths from the executing algorithms specified in C/C++ and attaches these datapaths to the embedded processor with an auto-generated software driver. The number of datapaths and their internal parallel functional units are scaled to fit the application. It seamlessly integrates the design tools of the embedded processor to reduce the re-training/design efforts and maintains short product development time as the pure software approaches. A JPEG encoder is built in CASCADE successfully with an auto-generated four-MAC accelerator to achie...