A new type of language resource 'BAStat' has been released by the Bavarian Archive for Speech Signals. In contrast to primary resources like speech and text corpora BASt...
We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission probability of an unknown word using all the characters in th...
Discriminative methods have shown significant improvements over traditional generative methods in many machine learning applications, but there has been difficulty in extending th...
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian m...
The design of an optimal Bayesian classifier for multiple features is dependent on the estimation of multidimensional joint probability density functions and therefore requires a ...