Kernel-based systems are currently very popular approaches to supervised learning. Unfortunately, the computational load for training kernel-based systems increases drastically with the number of training data points. Recently, a number of approximate methods for scaling kernel-based systems to large data sets have been introduced. In this paper we investigate the relationship between three of those approaches and compare their performances experimentally.