We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acoustic activities in spoken languages. The strategy for acoustic vocabulary selection is studied by comparing different feature selection methods. With an appropriate acoustic vocabulary, a voice tokenizer converts a spoken document into a text-like document of acoustic words. Thus, a spoken document can be represented by a count vector, named a bag-of-sounds vector, which characterizes a spoken document’s semantic domain. We study two phonotactic-semantic classifiers, the support vector machine classifier and the latent semantic analysis classifier, and their properties. The phonotactic-semantic framework constitutes a new paradigm in spoken document classification, as demonstrated by its success in the spoken language identification task. It achieves 18.2% error reduction over state-of-the-art benchmark perf...