Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is nec...
We propose a simple two-level hierarchical probability model for unsupervised word segmentation. By treating words as strings composed of morphemes/phonemes which are themselves c...
Abstract In this paper, we describe our Question Answering (QA) system called QUANTUM. The goal of QUANTUM is to find the answer to a natural language question in a large document ...
Leveraging information from relevance assessments has been proposed as an effective means for improving retrieval. We introduce a novel language modeling method which uses inform...
The structural features of XML components are an extra source of information that should be used in a contentoriented retrieval task on this type of documents. This paper explores...