We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database...
Abstract. In experimental design, a standard approach for distinguishing experimentally induced effects from unwanted effects is to design control measurements that differ only ...
Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates...
This paper considers a method for learning a distance metric in a fingerprinting system which identifies a query content by measuring the distance between the fingerprint of th...
Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task “Is an article fea...