The performance of question answering system is evaluated through successive evaluations campaigns. A set of questions are given to the participating systems which are to find the correct answer in a collection of documents. The creation process of the questions may change from one evaluation to the next. This may entail an uncontroled question difficulty shift. For the QAst 2009 evaluation campaign, a new procedure was adopted to build the questions. Comparing results of QAst 2008 and QAst 2009 evaluations, a strong performance loss could be measured in 2009 for French and English, while the Spanish systems globally made progress. The measured loss might be related to this new way of elaborating questions. The general purpose of this paper is to propose a measure to calibrate the difficulty of a question set. In particular, a reasonable measure should output higher values for 2009 than for 2008. The proposed measure relies on a distance measure between the critical elements of a ques...