This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
The dominant method for evaluating search engines is the Cranfield paradigm, but the existing metrics do not consider some modern search engines features, such as document snippets...
Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. ...
We introduce the Ranked Feature Fusion framework for information retrieval system design. Typical information retrieval formalisms such as the vector space model, the bestmatch mo...
Complex network analysis is a growing research area in a wide variety of domains and has recently become closely associated with data, text and web mining. One of the most active ...
Cristian Klen dos Santos, Alexandre Evsukoff, Beat...