Timeline generation is a summarisation task which transforms a narrative, roughly chronological input text into a set of timestamped summary sentences, each expressing an atomic historical event. We present a methodology for evaluating systems which create such timelines, based on a novel corpus consisting of 36 humancreated timelines. Our evaluation relies on deep semantic units which we call historical content units. An advantage of our approach is that it does not require human annotation of new system summaries.