

Keyword search for data-centric XML collections with long text fields

14 years 2 months ago
Keyword search for data-centric XML collections with long text fields
Users who are unfamiliar with database query languages can search XML data sets using keyword queries. Current approaches for supporting such queries are either for textcentric XML, where the structure is very simple and long text fields predominate; or data-centric, where the structure is very rich. However, long text fields are becoming more common in data-centric XML, and existing approaches deliver relatively poor precision, recall, and ranking for such data sets. In this paper, we introduce an XML keyword search method that provides high precision, recall, and ranking quality for data-centric XML, even when long text fields are present. Our approach is based on a new group of structural relationships called normalized term presence correlation (NTPC). In a one-time setup phase, we compute the NTPCs for a representative DB instance, then use this information to rank candidate answers for all subsequent queries, based on each answer’s structure. Our experiments with 65 user-su...
Arash Termehchy, Marianne Winslett
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2010
Where EDBT
Authors Arash Termehchy, Marianne Winslett
Comments (0)