Sciweavers

652 search results - page 90 / 131
» Accelerated EM-based clustering of large data sets
Sort
View
SIGMOD
2011
ACM
210views Database» more  SIGMOD 2011»
12 years 10 months ago
A platform for scalable one-pass analytics using MapReduce
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programm...
Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGreg...
DEXA
2007
Springer
154views Database» more  DEXA 2007»
14 years 2 months ago
Performance Oriented Schema Matching
Abstract. Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping ...
Khalid Saleem, Zohra Bellahsene, Ela Hunt
PVLDB
2010
129views more  PVLDB 2010»
13 years 6 months ago
Entity Resolution with Evolving Rules
Entity resolution (ER) identifies database records that refer to the same real world entity. In practice, ER is not a one-time process, but is constantly improved as the data, sc...
Steven Whang, Hector Garcia-Molina
DATAMINE
2007
101views more  DATAMINE 2007»
13 years 8 months ago
Using metarules to organize and group discovered association rules
The high dimensionality of massive data results in the discovery of a large number of association rules. The huge number of rules makes it difficult to interpret and react to all ...
Abdelaziz Berrado, George C. Runger
DASFAA
2004
IEEE
135views Database» more  DASFAA 2004»
13 years 11 months ago
Semi-supervised Text Classification Using Partitioned EM
Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling traini...
Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu