Sorting long sequences of keys is a problem that occurs in many different applications. For embedded systems, a uniprocessor software solution is often not applicable due to the low performance, while realizing multiprocessor sorting methods on parallel computers is much too expensive with respect to power consumption, physical weight, and cost. We investigate cost/performance tradeoffs for hybrid sorting algorithms that use a mixture of sequential merge sort and systolic insertion sort techniques. We propose a scalable architecture for integer sorting that consists of a uniprocessor and an FPGA-based parallel systolic co-processor. Speedups obtained analytically and experimentally and depending on hardware (cost) constraints are determined as a function of time constants of the uniprocessor and the co-processor.