In this work, we study similarity measures for text-centric XML documents based on an extended vector space model, which considers both document content and structure. Experimental results based on a benchmark showed superior performance of the proposed measure over the baseline which ignores structural knowledge of XML documents. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval models – retrieval models. General Terms Models, Experimentation Keywords XML Search and Retrieval
Christopher C. Yang, Nan Liu