Integrating Deep Web sources requires highly accurate semantic matches between the attributes of the source query interfaces. These matches are usually established by comparing the similarities of the attributes' labels and instances. However, attributes on query interfaces often have no or very few data instances. The pervasive lack of instances seriously reduces the accuracy of current matching techniques. To address this problem, we describe WebIQ, a solution that learns from both the Surface Web and the Deep Web to automatically discover instances for interface attributes. WebIQ extends question answering techniques commonly used in the AI community for this purpose. We describe how to incorporate WebIQ into current interface matching systems. Extensive experiments over five realworld domains show the utility of WebIQ. In particular, the results show that acquired instances help improve matching accuracy from 89.5% F-1 to 97.5%, at only a modest runtime overhead.
Wensheng Wu, AnHai Doan, Clement T. Yu