In this paper we investigate how to automatically determine if two document collections are written from different perspectives. By perspectives we mean a point of view, for examp...
The new wrapper model for extractiong text data from HTML documents is introduced. The Kushmerick's wrapper class (Kusshmerick 2000) may be unsuccessful in the case that suff...
Document retrieval in languages with a rich and complex morphology – particularly in terms of derivation and (single-word) composition – suffers from serious performance degra...
Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A stati...
Alexander F. Gelbukh, Grigori Sidorov, Adolfo Guzm...
Indexing quality has an overwhelming effect on retrieval effectiveness of search engines. In the past few years it has become one of the major challenges in the search engines are...