One of the Web information Retrieval (IR) problems these days is to identify redundant information that exist in (replicated) Web documents. These documents can easily be found in...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
Background: Non-sequence gene data (images, literature, etc.) can be found in many different public databases. Access to these data is mostly by text based methods using gene name...
Michael J. Gilchrist, Mikkel B. Christensen, Richa...
The problem of finding your way through a relatively unknown collection of digital documents can be daunting. Such collections sometimes have few categories and little hierarchy, ...
Traditional phrase matching approaches, which can discover documents containing exactly the same phrases, fail to detect documents including phrases that are semantically relevant,...