— One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input da...
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas...
Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multipl...
—We investigate the design of a clean-slate control and nt plane for data networks using the abstraction of 4D architecture, utilizing and extending 4D’s concept of logically c...
Cleaneval is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus for linguistic and lang...
Marco Baroni, Francis Chantree, Adam Kilgarriff, S...
High performance general-purpose processors are increasingly being used for a variety of application domains scienti c, engineering, databases, and more recently, media processing...
Parthasarathy Ranganathan, Sarita V. Adve, Norman ...