This discussion paper describes a sequence of experiments with human subjects aimed at finding out how an nlg system should choose between the different forms of a gradable adjective. This case study highlights some general questions that one faces when trying to base nlg systems on empirical evidence: one question is what task to set a subject so as to obtain the most useful information about that subject, another question has to do with differences between subjects.