This paper presents a low-complexity, high-speed 4-dimensional 8-ary Phase Shift Keying Trellis Coded Modulation (4-D 8PSK TCM) decoder. In the design, an efficient architecture for the transition metrics unit (TMU) is proposed to significantly reduce the computation complexity without degrading the performance. In addition, pipelining and parallel processing techniques are exploited to increase the decoding throughput. Synthesis results show that the FPGA implementation of the TCM decoder can