— One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input da...
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas...
This paper reports on a study involving the automatic extraction of Chinese legal terms. We used a word segmented corpus of Chinese court judgments to extract salient legal expres...
Maps are artifacts often derived from multiple sources of data, e.g., sensors, and processed by multiple methods, e.g., gridding and smoothing algorithms. As a result, complex meta...
Nicholas Del Rio, Paulo Pinheiro da Silva, Ann Q. ...
Industrial databases often contain a large amount of unfilled information. During the knowledge discovery process one processing step is often necessary in order to remove these ...
As sensing technologies become increasingly distributed and democratized, citizens and novice users are becoming responsible for the kinds of data collection and analysis that have...
Wesley Willett, Paul M. Aoki, Neil Kumar, Sushmita...