Anomalous events that affect the performance of networks are a fact of life. It is therefore not surprising that recent years have seen an explosion in research on network anomaly detection. What is quite surprising, however, is the lack of controlled evaluation of these detectors. In this paper we argue that there are numerous important questions regarding the effectiveness of anomaly detectors that cannot be answered by the evaluation techniques employed today. We present four central requirements of a rigorous evaluation that can only be met by simulating both the anomaly and its surrounding environment. While simulation is necessary, it is not sufficient. We therefore present an outline of an evaluation methodology that leverages both simulation and traces from operational networks. Categories and Subject Descriptors C.2.3 [Computer-Communication Networks]: Network Operations; C.4 [Performance of Systems]: General Terms Experimentation, Performance, Measurement