Speculative execution of information gathering plans can dramatically reduce the effect of source I/O latencies on overall performance. However, the utility of speculation is closely tied to how accurately data values are predicted at runtime. While caching is one approach that can be used to issue future predictions, it scales poorly with large data sources and is unable to make intelligent predictions given previously unseen input data, even when there is an obvious general relationship between prior input and resulting output. In this paper, we describe a novel way to combine classification and transduction for a more efficient and accurate value prediction strategy, one that capable of issuing predictions about previously unseen hints. We show how our approach results in significant speedups for plans that query multiple sources or sources that require multi-page navigation.
Greg Barish, Craig A. Knoblock