We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circuits on FPGAs (field-programmable gate arrays). Building on dynamic synthesis for single-processor single-thread systems, known as warp processing, thread warping improves performances of multiprocessor systems by speeding up individual threads and by allowing more threads to execute concurrently. Furthermore, thread warping maintains the important separation of function from architecture, enabling portability of applications to architectures with different quantities of microprocessors and FPGA—an advantage not shared by static compilation/synthesis approaches. We introduce a framework of architecture, CAD tools, and operating system that together support thread warping. We summarize experiments on an extensive architectural simulation framework we developed, showing application speedups of 4x to 502x, ave...