We study an extension of the "standard" learning models to settings where observing the value of an attribute has an associated cost (which might be different for different attributes). Our model assumes that the correct classification is given by some target function f from a class of functions F; most of our results discuss the ability to learn a clause (an OR function of a subset of the variables) in various settings: Offline: We are given both the function f and the distribution D that is used to generate an input x. The goal is to design a strategy to decide what attribute of x to observe next so as to minimize the expected evaluation cost of f(x). (In this setting there is no "learning" to be done but only an optimization problem to be solved; this problem is to be NP-hard and hence approximation algorithms are presented.) Distributional online: We study two types of "learning" problems; one where the target function f is known to the learner but th...