The deletion channel is the simplest point-to-point communication channel that models lack of synchronization. Input bits are deleted independently with probability d, and when they are not deleted, they are not affected by the channel. Despite significant effort, little is known about the capacity of this channel, and even less about optimal coding schemes. In this paper we develop a new systematic approach to this problem, by demonstrating that capacity can be computed in a series expansion for small deletion probability. We compute three leading terms of this expansion, and find an input distribution that achieves capacity up to this order. This constitutes the first optimal coding result for the deletion channel. The key idea employed is the following: We understand perfectly the deletion channel with deletion probability d = 0. It has capacity 1 and the optimal input distribution is i.i.d. Bernoulli(1/2). It is natural to expect that the channel with small deletion probabili...