Publication repositories contain an abundance of information about the evolution of scientific research areas. We address the problem of creating a visualization of a research are...
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian m...
Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which in...
This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To...
Shafiq R. Joty, Giuseppe Carenini, Gabriel Murray,...
Topic Detection and Tracking (TDT) tasks are evaluated using a cost function. The standard TDT cost function assumes a constant probability of relevance P(rel) across all topics. ...