— The fully parallel LDPC decoding architecture can achieve high decoding throughput, but it suffers from large hardware complexity caused by a large set of processing units and complex interconnections. A practical solution of areaefficient decoders is to use the partially parallel architecture in which a PU is shared for a several rows or columns. It is important in the partially parallel architecture to determine the rows or columns to be processed in a PU and their processing order. The dependencies between rows and columns should be considered to minimize the overall processing time by overlapping the decoding operations. This paper proposes an efficient scheduling algorithm that can be applied to general LDPC codes, which is based on the concept of the matrix permutation. Experimental results show that the proposed scheduling achieves a higher decoding rate, leading to a reduction of 25% processing time on the average. A 1024-bit rate-1/2 LDPC decoder employing the proposed sch...