Although widely used, Web Content Accessibility Guidelines (WCAG) have not been studied from the viewpoint of their validity and reliability. WCAG 2.0 explicitly claim that they are based on “testable”criteria, but no scientific evidence exists that this is actually the case. Validity (how well all and only the true problems can be identified) and reliability (the extent to which different evaluations of the same page lead to same results) are key factors for quality of accessibility evaluation methods. They need to be well studied and understood for methods, and guidelines, that are expected to have a major impact. This paper presents an experiment aimed at finding out what is the validity and reliability of different checkpoints