Modern chip multiprocessors (CMPs) are designed to exploit both instruction-level parallelism (ILP) within processors and thread-level parallelism (TLP) within and across processors. However, the number of processors and the granularity of each processor are fixed at design time. This paper evaluates a flexible architectural approach, called Composable Lightweight Processors (or CLPs), that allows simple, low-power cores to be aggregated together dynamically, forming larger, more powerful single-threaded processors without changing the application binary. We evaluate one such design with 32 cores called TFlex, which can be configured as 32 dual-issue processors, or as a single 64-wide issue processor, or as any point in between. Use of an Explicit Data Graph Execution (EDGE) ISA enables the system to be fully composable, with no monolithic structures spanning the cores. Simulation results show that CLPs achieve an average performance boost of 42%, an average area-efficiency of 3.4...
Changkyu Kim, Simha Sethumadhavan, M. S. Govindan,