The execution performance of an information gathering plan can suffer significantly due to remote I/O latencies. A streaming dataflow model of execution addresses the problem to some extent, exploiting all natural opportunities for parallel execution, as allowed by the data dependencies in a plan. Unfortunately, plans that integrate information from multiple sources often use the results of one operation as the basis for forming queries to a subsequent operation. Such cases require sequential execution, an inefficiency that can erase prior gains made through techniques like streaming dataflow. To address this problem, we present a technique called speculative plan execution, an out-of-order method that capitalizes on knowledge gained from prior executions as a means for overcoming remaining data dependencies between plan operators. Our approach inserts additional plan operators that generate and confirm speculative results, while preserving the safety and fairness of overall execution...
Greg Barish, Craig A. Knoblock