This paper presents a novel formulation and approach to the minimal document set retrieval problem. Minimal Document Set Retrieval (MDSR) is a promising information retrieval task...
The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection s...
Retrieval and indexing in multimedia databases has been an active topic both in the Information Retrieval and computer vision communities for a long time. In this paper we propose...
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Hidden Markov models (HMMs) are powerful statistical models that have found successful applications in Information Extraction (IE). In current approaches to applying HMMs to IE, a...