Speculative execution of information gathering plans can dramatically reduce the effect of source I/O latencies on overall performance. However, the utility of speculation is closely tied to how accurately data values are predicted at runtime. Caching is one approach that can be used to issue future predictions, but it scales poorly with large data sources and is unable to make intelligent predictions given previously unseen input data, even when there is an obvious relationship between past input and the output it generated. In this paper, we describe a novel way to combine classification and transduction for a more efficient and accurate value prediction strategy, one capable of issuing predictions about previously unseen hints. We show how our approach results in significant speedups for plans that query multiple sources or sources that require multi-page navigation.
Greg Barish, Craig A. Knoblock