This paper presents an extensive and detailed experimental evaluation of XQuery processors. The study consists of running five publicly available XQuery benchmarks -- the Michigan benchmark (MBench), XBench, XMach-1, XMark and X007 -- on six XQuery processors, three stand-alone (file-based) XQuery processors (Galax, Qizx/Open, SaxonB) and three XML/XQuery database systems (BerkeleyDB/ XML, MonetDB/XQuery, X-Hive/DB). Next to assessing and comparing the functionality, performance and scalability for the various systems, the major focus of this work is to report in detail about the experiences made while performing such an exhaustive study, to discuss all the problems that we encountered and how we solved them, and hence to hopefully provide some guidelines (or even a recipe) for performing reproducible large-scale experimental research and system evaluation.