The embedded-block coding with optimized truncation (EBCOT), which consists of a bit-plane coder (BPC) and a binary arithmetic coder (BAC), is the bottleneck in realizing a high-performance JPEG2000 encoding system due to its characteristics of bit-wise processing. Although efficient architectures for BPCs have been presented, the performance of EBCOT is mainly restricted by BAC because of its sequential processing nature. In this paper, we propose a novel architecture for BAC which is based on our optimization technique named as trace pipelining. It enables parallel processing of the usual byte-out cases in BAC, and can achieve a throughput of 534 M symbols/sec, which is the highest compared to those of the previous BAC architectures.