A Pilot Question Answering Task has been activated in the Cross-Language Evaluation Forum 2004 with a twofold objective. In the first place, the evaluation of Question Answering systems when they have to answer conjunctive lists, disjunctive lists and questions with temporal restrictions. In the second place, the evaluation of systems’ capability to give an accurate self-scoring about the confidence on their answers. In this way, two measures have been designed to be applied on all these different types of questions and to reward systems that give a confidence score with a high correlation with the human assessments. The forty eight runs submitted to the Question Answering Main Track have been taken as a case of study, confirming that some systems are able to give a very accurate score and showing how the measures reward this fact.