In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
This paper introduces LDA-G, a scalable Bayesian approach to finding latent group structures in large real-world graph data. Existing Bayesian approaches for group discovery (suc...
Approximating the joint data distribution of a multi-dimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical ...
Amol Deshpande, Minos N. Garofalakis, Rajeev Rasto...
Data--call records, internet packet headers, or other transaction records--are coming down a pipe at a ferocious rate, and we need to monitor statistics of the data. There is no r...
Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-enter...
Adrian Bachmann, Christian Bird, Foyzur Rahman, Pr...