1 In this article, we report our efforts in mining the information encoded as clickthrough data in the server logs to evaluate and monitor the relevance ranking quality of a commercial web search engine. We describe a metric called pSkip that aims to quantify the ranking quality by estimating the probability of users encountering non relevant results that cost them the efforts to read and skip. A search engine with a lower pSkip is regarded as having a better ranking quality. A key design goal of pSkip is to integrate the findings from two sets of user studies that utilize eye-tracking devices to track users’ browsing patterns on the search result pages, and that use specially instrumented browsers to actively solicit users’ explicit judgments on their search activities. We present the derivation of the maximum likelihood estimation of pSkip and demonstrate its efficacy in describing the user study data. The mathematical properties of pSkip are further analyzed and compared with se...