The problem of predicting a sequence x ; x ; . . . generated by a discrete source with unknown statistics is considered. Each letter x is predicted using the information on the word x x 1 1 1 x only. This problem is of great importance for data compression, because of its use to estimate probability distributions for PPM algorithms and other adaptive codes. On the other hand, such prediction is a classical problem which has received much attention. Its history can be traced back to Laplace. We address the problem where the sequence is generated by an independent and identically distributed (i.i.d.) source with some large (or even infinite) alphabet and suggest a class of new methods of prediction.