Most traditional Information Retrieval (IR) systems, including web search engines, operationalize “relevant” as the word frequency in a document of a set of keywords. Because ...
Hyun Woong Shin, Eduard H. Hovy, Dennis McLeod, La...
Document ranking is well known to be a crucial process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. ...
In this work, we study similarity measures for text-centric XML documents based on an extended vector space model, which considers both document content and structure. Experimenta...
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...
Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifica...