The fact that several web accessibility metrics exist may be evidence of a lack of a comparison framework that highlights how well they work and for what purposes they are appropriate. In this paper we aim at formulating such a framework, demonstrating that it is feasible, and showing the findings we obtained when we applied it to seven existing automatic accessibility metrics. The framework encompasses validity, reliability, sensitivity, adequacy and complexity of metrics in the context of four scenarios where the metrics can be used. The experimental demonstration of the viability of the framework is based on applying seven published metrics to more than 1500 web pages and then operationalizing the notions of validity-as-conformance,...