Text is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing in a system that loosely integrates an extensible database system and a text retrieval system. We focus on a class of conjunctive queries that include joins between text and structured data, in addition to selections over these two types of data. We adapt techniques from distributed query processing and introduce a novel class of join methods based on probing that is especially useful for joins with text systems, and we present a cost model for the various alternative query processing methods. Experimental results con rm the utility of these methods. The space of query plans is extended due to the additional techniques, and we describe an optimization algorithm for searching this extended space. The techniques we describe in this paper are applicable to other types of external data managers loosely integrated ...
Surajit Chaudhuri, Umeshwar Dayal, Tak W. Yan