We achieved a state of the art performance in statistical machine translation by using a large number of features with an online large-margin training algorithm. The millions of parameters were tuned only on a small development set consisting of less than 1K sentences. Experiments on Arabic-toEnglish translation indicated that a model trained with sparse binary features outperformed a conventional SMT system with a small number of features.