Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpo...
Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). Furthermore, real-time IR systems (such as...
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge p...
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
—Techniques for automatic annotation of spoken content making use of speech recognition technology have long been characterized as holding unrealized promise to provide access to...
Roeland Ordelman, Franciska de Jong, Martha Larson