We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic nite automata which we name Probabilistic Finite Su x Automata. The learning algorithm is motivated by real applications in man-machine interaction such as handwriting and speech recognition. Conventionally used xed memory Markov and hidden Markov models have either severe practical or theoretical drawbacks. Though general hardness results are known for learning distributions generated by sources with similar structure, we prove that our algorithm can indeed e ciently learn distributions generated by our more restricted sources. In Particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made small with high con dence in polynomial time and sample complexity. We demonstrate the applicability of our algorithm by learning the s...