Models of bags of words typically assume topic mixing so that the words in a single bag come from a limited number of topics. We show here that many sets of bag of words exhibit a...
The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model a...
Latent semantic analysis (LSA), as one of the most popular unsupervised dimension reduction tools, has a wide range of applications in text mining and information retrieval. The k...
Xi Chen, Yanjun Qi, Bing Bai, Qihang Lin, Jaime G....
Given a movie comment, does it contain a spoiler? A spoiler is a comment that, when disclosed, would ruin a surprise or reveal an important plot detail. We study automatic methods...
Many recent statistical parsers rely on a preprocessing step which uses hand-written, corpus-specific rules to augment the training data with extra information. For example, head-...