Part-of-speech (POS) tag distributions are known to exhibit sparsity -- a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorp...
This paper proposes a robust method for word sense disambiguation of Japanese. We combined several classifiers using heterogeneous language resources, a machine readable dictiona...
Semistatic word-based byte-oriented compression codes are known to be attractive alternatives to compress natural language texts. With compression ratios around 30%, they allow di...
This paper describes a study in which a corpus of spoken Danish annotated with focus and topic tags was used to investigate the relation between information structure and pauses. ...
We describe a new tagging model where the states of a hidden Markov model (HMM) estimated by unsupervised learning are incorporated as the features in a maximum entropy model. Our...