We propose a new optimization algorithm called Generalized Baum Welch (GBW) algorithm for discriminative training on hidden Markov model (HMM). GBW is based on Lagrange relaxation on a transformed optimization problem. We show that both Baum-Welch (BW) algorithm for ML estimate of HMM parameters, and the popular extended Baum-Welch (EBW) algorithm for discriminative training are special cases of GBW. We compare the performance of GBW and EBW for Farsi large vocabulary continuous speech recognition (LVCSR).