The design of webbases, database systems for supporting Webbased applications, is currently an active area of research. In this paper, we propose a 3-layer architecture for designing and implementing webbases for querying dynamic Web content (i.e., data that can only be extracted by filling out multiple forms). The lowest layer, virtual physical layer, provides navigation independence by shielding the user from the complexities associated with retrieving data from raw Web sources. Next, the traditional logical layer supports site independence. The top layer is analogous to the external schema layer in traditional databases. Within this architectural framework we address two problems unique to webbases — retrieving dynamic Web content in the virtual physical layer and querying of the external schema by the end user. The layered architecture makes it possible to automate data extraction to a much greater degree than in existing proposals. Wrappers for the virtual physical schema can ...
Hasan Davulcu, Juliana Freire, Michael Kifer, I. V