In implementing an efficient block coder for JPEG2000, the memories required for storing the state variables dominate the hardware cost of a block coder. In this paper, we propose a novel bit-plane coder (BPC) architecture that derives all the state variables on the fly, thereby eliminating the memory requirement. In addition, we present a concurrent columnstripe coding algorithm which merges the scanning of all three coding passes into a single context window to generate all relevant context outputs concurrently in a single clock cycle. Experimental results show that the memory requirement and overall hardware cost of the proposed BPC are much smaller than those of previous architectures. Furthermore, as the column-stripe can be encoded for all three passes in a single clock cycle, a minimum of four context outputs are generated per cycle. Therefore, the following arithmetic coder that encodes the BPC outputs can never be in the idle state, enabling fast computation of overall block ...