When multiple data sources are available for clustering, an a priori data integration process is usually required. This process may be costly and may not lead to good clusterings,...
Elisa Boari de Lima, Raquel Cardoso de Melo Minard...
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...
The adoption of distributed version control (DVC), such as Git and Mercurial, in open-source software (OSS) projects has been explosive. Why is this and how are projects using DVC?...
Earl T. Barr, Christian Bird, Peter C. Rigby, Abra...
We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model t...
Dustin Hillard, Stefan Schroedl, Eren Manavoglu, H...
Semantic similarity measures play important roles in information retrieval and Natural Language Processing. Previous work in semantic web-related applications such as community mi...