In this work we consider the problem of universal prediction of individual sequences where the universal predictor is a deterministic finite state machine, with a fixed, relatively small, number of states. We examine the case of self-information loss, where the predictions are probability assignments which is equivalent to universal data compression. While previous results in that area are asymptotic only, we examine a class of machine structures and find an optimal method for allocating the probabilities to the machine states which achieves minimal redundancy w.r.t. the constant predictors class. We show analytic bounds for the redundancy of machines from that class, and construct machines with redundancy that is arbitrarily close to these bounds. Finally, we compare our machines to previously proposed machines and show that our machine with 300 states achieves smaller redundancy than the best machine known so far with 6000 states.