Several problems in text categorization are too hard to be solved by standard bag-of-words representations. Work in kernel-based learning has approached this problem by (i) considering information about the syntactic structure of the input or by (ii) incorporating knowledge about the semantic similarity of term features. In this paper, we propose a generalized framework consisting of a family of kernels that jointly incorporates syntax and semantics. We show that both components can be flexibly adapted and tuned towards the particular application domain. We demonstrate the power of this approach in a series of experiments on two diverse datasets, each of which presents a non-standard text categorization problem: one for the classification of natural language questions from a TREC question answering dataset and the other for the automated assignment of ICT9 categories to short textual fragments of medical diagnoses. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]:...