We present a semantic caching approach for optimizing the performance of information mediators. A critical problem with information mediators, particularly those gathering and integrating information from Web sources is a high query response time. This is because the data needed to answer user queries is present across several di erent Web sources (and in several pages within a source) and retrieving,extracting and integrating the data is time consuming. We address this problem using a semantic caching approach, where we cache useful classes of information and de ne them as auxiliary data sources for the information mediator. The key challenge here is to identify the content and schema of the classes of information that would be useful to cache as the auxiliary sources. We present an algorithm that identi es such classes by analyzing patterns in user queries. The algorithm utilizes the kr system loom to reason about classes of information to cache. We describe an implementation of our...
Naveen Ashish, Craig A. Knoblock, Cyrus Shahabi