Sciweavers

1260 search results - page 207 / 252
» Data Quality in Genome Databases
Sort
View
STOC
2001
ACM
134views Algorithms» more  STOC 2001»
14 years 8 months ago
Data-streams and histograms
Histograms are typically used to approximate data distributions. Histograms and related synopsis structures have been successful in a wide variety of popular database applications...
Sudipto Guha, Nick Koudas, Kyuseok Shim
DEXA
2007
Springer
154views Database» more  DEXA 2007»
14 years 1 months ago
Performance Oriented Schema Matching
Abstract. Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping ...
Khalid Saleem, Zohra Bellahsene, Ela Hunt
KDD
2005
ACM
107views Data Mining» more  KDD 2005»
14 years 1 months ago
Cross-relational clustering with user's guidance
Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information of...
Xiaoxin Yin, Jiawei Han, Philip S. Yu
ICDE
2009
IEEE
121views Database» more  ICDE 2009»
14 years 9 months ago
Large-Scale Deduplication with Constraints Using Dedupalog
We present a declarative framework for collective deduplication of entity references in the presence of constraints. Constraints occur naturally in many data cleaning domains and c...
Arvind Arasu, Christopher Ré, Dan Suciu
SIGMOD
2004
ACM
157views Database» more  SIGMOD 2004»
14 years 7 months ago
Adaptive Ordering of Pipelined Stream Filters
We consider the problem of pipelined filters, where a continuous stream of tuples is processed by a set of commutative filters. Pipelined filters are common in stream applications...
Shivnath Babu, Rajeev Motwani, Kamesh Munagala, It...