This paper examines the architecture, algorithm and implementation of a switch-based multi-processor realization of the fast Fourier transform (FFT). The architecture employs M processing elements (PEs), and provides a speedup of M compared with systems that use a single PE. An algorithm is provided to detect and resolve memory conflicts. A CMOS implementation of a four-PE processor is presented. The design is reconfigurable to compute various FFT sizes. The design power consumption is scalable based on the number of active PEs. The timing, area and power results are discussed.
Bassam Jamil Mohd, Earl E. Swartzlander Jr.