Sciweavers

67 search results - page 9 / 14
» A Primitive Operator for Similarity Joins in Data Cleaning
Sort
View
SIGMOD
2010
ACM
174views Database» more  SIGMOD 2010»
14 years 5 days ago
Sampling dirty data for matching attributes
We investigate the problem of creating and analyzing samples of relational databases to find relationships between string-valued attributes. Our focus is on identifying attribute...
Henning Köhler, Xiaofang Zhou, Shazia Wasim S...
BTW
2007
Springer
133views Database» more  BTW 2007»
13 years 11 months ago
Pathfinder: XQuery Compila-tion Techniques for Relational Database Targets
: Relational database systems are highly efficient hosts to table-shaped data. It is all the more interesting to see how a careful inspection of both, the XML tree structure as wel...
Jens Teubner
DMIN
2009
142views Data Mining» more  DMIN 2009»
13 years 5 months ago
Efficient Record Linkage using a Double Embedding Scheme
Record linkage is the problem of identifying similar records across different data sources. The similarity between two records is defined based on domain-specific similarity functi...
Noha Adly
VLDB
2005
ACM
117views Database» more  VLDB 2005»
14 years 25 days ago
Parallel Querying with Non-Dedicated Computers
We present DITN, a new method of parallel querying based on dynamic outsourcing of join processing tasks to non-dedicated, heterogeneous computers. In DITN, partitioning is not th...
Vijayshankar Raman, Wei Han, Inderpal Narang
KDD
2001
ACM
253views Data Mining» more  KDD 2001»
14 years 7 months ago
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces
The similarity join is an important operation for mining high-dimensional feature spaces. Given two data sets, the similarity join computes all tuples (x, y) that are within a dis...
Jens-Peter Dittrich, Bernhard Seeger