Speech can be represented as a time/frequency distribution of energy using a multi-band filter bank. A Markov random field model, which takes into account the possible time asynchrony across the bands, is estimated for each segmental units to be recognized. The law of the speech process is given by a parametric Gibbs distribution and a maximum likelihood parameter estimation algorithm is developed. Experiments are conducted on an isolated word recognition problem. It is shown that similar performances are obtained with the new model and with standard HMM techniques in the mono-band case. In the multi-band case, it is shown that modeling inter-band synchrony is an interesting approach to increase the performance when the number of bands increases.