We describe a method for the fully automatic learning of hierarchical finite state translation models. The input to the method is transcribed speech utterances and their corresponding human translations, and the output is a set of head transducers, i.e. statistical lexical headoutward transducers. A word-alignment function and a head-ranking function are first obtained, and then counts are generated for hypothesized state transitions of head transducers whose lexical translations and word order changes are consistent with the alignment. The method has been applied to create an EnglishSpanish translation model for a speech translation application, with word accuracy of over 75% as measured by a string-distance comparison to three reference translations.