Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT