While performance, area, and power constraints have been the driving force in designing current communication-enabled embedded systems, post-fabrication and run-time adaptability is now required. Two dominant configurable hardware platforms are processors and FPGAs. However, for compute-intensive applications, neither platform delivers the needed performance at the desired low power. The need thus arises for custom, application-specific configurable (ASC) hardware. This paper addresses the optimization of ASC hardware. Our target application areas are multimedia and communication where an incoming packet (task) is processed independently of other packets. We innovatively utilize two concepts: external profiling and hardware threading. We utilize an M/M/c queueing model to profile task arrival patterns and show how profiling guides design decisions. We introduce the novel concept of hardware threading which allows on-the-fly borrowing of unutilized hardware, thus maximizing task...