Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakehold...
Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
In this paper, we present a new co-training strategy that makes use of unlabelled data. It trains two predictors in parallel, with each predictor labelling the unlabelled data for...
In this paper, we study the problem of discovering interesting patterns through user's interactive feedback. We assume a set of candidate patterns (i.e., frequent patterns) h...
Argo is a web-based NLP and text mining workbench with a convenient graphical user interface for designing and executing processing workflows of various complexity. The workbench...