ThemeRiver™ is a prototype system that visualizes thematic variations over time within a large collection of documents. The “river” flows from left to right through time, ch...
Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document core...
Jian Huang 0002, Sarah M. Taylor, Jonathan L. Smit...
Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are u...
Versioning systems such as CVS or Subversion exhibit a large potential to investigate the evolution of software systems. They are used to record the development steps of software ...
Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge numbers of features. Most previous studies found that the major...