In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse structures for information management and s...
Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, Wolfgan...
In this paper, we address the question of how we can identify hosts that will generate links to web spam. Detecting such spam link generators is important because almost all new s...
High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. ...
We present a corpus-based approach to the class expansion task. For a given set of seed entities we use co-occurrence statistics taken from a text collection to define a membersh...
The detection of new information in a document stream is an important component of many potential applications. In this paper, a new novelty detection approach based on the identi...