— The input-queued switch architecture is widely used in Internet routers due to its ability to run at very high line speeds. A central problem in designing an input-queued switch is the scheduling algorithm that decides which packets to transfer from ingress ports to egress ports in a given timeslot. It is desirable that such algorithms be iterative (so as to be pipelineable), distributed (allowing flexibility in hardware implementation) and are able to deliver high performance (in terms of throughput and delay). In practice, implementable algorithms have so far had limited success in combining all of the above properties. For example, the popular iSLIP [1] algorithm is known to perform suboptimally, but it is commercially deployed mainly because it is iterative and distributed. The main contribution of this paper is the design and systematic analysis of two algorithms which, to the best of our knowledge, are the first high-performance iterative and distributed scheduling algorith...