We present a novel system for automatically marking up text documents into XML and discuss the benefits of XML markup for intelligent information retrieval. The system uses the Se...
We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple in...
A pseudoword is a composite comprised of two or more words chosen at random; the individual occurrences of the original words within a text are replaced by their conflation. Pseu...
Columbia’s Newsblaster tracking and summarization system is a robust system that clusters news into events, categorizes events into broad topics and summarizes multiple articles...
Kathleen McKeown, Regina Barzilay, John Chen, Davi...
In this paper we present ONTOSCORE, a system for scoring sets of concepts on the basis of an ontology. We apply our system to the task of scoring alternative speech recognition hy...
Iryna Gurevych, Rainer Malaka, Robert Porzel, Hans...