Duration of phonemic segments provide important cues for distinguishing words in languages such as Arabic. Recently, we proposed a discriminatively estimated joint acoustic, duration and language model for large vocabulary speech recognition [1]. In that work, we found simple discrete models to be effective for modeling duration, albeit they were neither smoothed nor parsimonious. These limitations are addressed here with two alternative models – parametric and smoothed-discrete models. Unlike previous work on parametric duration model, we estimate their parameters discriminatively and derive an analytical expression for estimating the parameters of a log-normal distribution using a recent approach [2]. On a large vocabulary Arabic task, we empirically evaluated different segmental units and durations models. Our results show bigrams of clustered states modeled with smoothed-discrete duration models are relatively more accurate and efficient than other models considered.