The task of computingbinary prefix sums (BPS, for short) arises, for example, in expression evaluation, data and storage compaction, and routing. This paper describes a scalable VLSI architecture for the BPS problem. We adopt as the central theme of this effort, the recognition of the fact that the broadcast delay incurred by a signal propagating along a bus is, at best, linear in the distance traversed. Thus, one of our design criteria is to keep buses as short as possible. In this context, our main contributionis to show that we can use short buses in conjunction with shift switching to obtain a scalable VLSI architecture for the BPS problem.