Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in...
—The strategies for mining frequent itemsets, which is the essential part of discovering association rules, have been widely studied over the last decade. In real-world datasets,...
Correlation mining has gained great success in many application domains for its ability to capture underlying dependencies between objects. However, research on correlation mining ...
This paper presents a methodology for knowledge acquisition from source code. We use data mining to support semiautomated software maintenance and comprehension and provide practi...
Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...