This paper presents a discriminative training (DT) approach to irrelevant variability normalization (IVN) based training of feature transforms and hidden Markov models for large vocabulary continuous speech recognition. A speaker-clustering based method is used for acoustic sniffing and maximum mutual information (MMI) is used as a training criterion. Combined with unsupervised adaptation of feature transforms, the IVN-based DT approach achieves a 14.5% relative word error rate reduction over an MMI-trained baseline system on a Switchboard-1 conversational telephone speech transcription task.