We generalize the classical algorithms of Valiant and Haussler for learning conjunctions and disjunctions of Boolean attributes to the problem of learning these functions over arbitrary sets of features; including features that are constructed from the data. The result is a general-purpose learning machine, suitable for practical learning tasks, that we call the Set Covering Machine. We present a version of the Set Covering Machine that uses generalized balls for its set of data-dependent features and compare its performance to the famous Support Vector Machine. By extending a technique pioneered by Littlestone and Warmuth, we bound its generalization error as function of the amount of data compression it achieves during training.