This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often ari...
The usual data mining setting uses the full amount of data to derive patterns for different purposes. Taking cues from machine learning techniques, we explore ways to divide the d...
When intelligent systems reason about complex problems with a large hierarchical classification space it is hard to evaluate system performance. For classification problems, differ...
This paper addresses the problem of dynamic model parameter selection for loglinear model based statistical machine translation (SMT) systems. In this work, we propose a principle...