We give a universal kernel that renders all the regular languages linearly separable. We are not able to compute this kernel efficiently and conjecture that it is intractable, but we do have an efficient ǫ-approximation. 1 Background Since the advent of Support Vector Machines (SVMs), kernel methods have flourished in machine learning theory [7]. , a kernel is a positive definite function from X × X to R, which, via Mercer’s theorem, endows an abstract set with the structure of a Hilbert space. Kernels provide both computational and theoretical power. The so-called kernel trick, when available, allows us to bypass computing the explicit embedding φ : X → RF in feature space via the identity K(x, y) = φ(x), φ(y) ; this can lead to a considerable gain in efficiency. On a more conceptual level, imposing an inner space structure on an abstract set allows us to harness the theoretical and computational utility of linear algebra and convex optimization. A concrete example where k...