—FPGAs are widely used for evaluating the error-floor performance of LDPC (low-density parity check) codes. We propose a scalable vector decoder for FPGA-based implementation of quasi-cyclic (QC) LDPC codes that takes advantage of the high bandwidth of the embedded memory blocks (called Block RAMs in a Xilinx FPGA) by packing multiple messages into the same word. We describe a vectorized overlapped message passing algorithm that results in 3.5X to 5.5X speedup over state-of-theart FPGA implementations in literature.