Sciweavers

1260 search results - page 172 / 252
» Data Quality in Genome Databases
Sort
View
KDD
2008
ACM
176views Data Mining» more  KDD 2008»
14 years 8 months ago
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface
Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
Peter Christen
VLDB
2005
ACM
140views Database» more  VLDB 2005»
14 years 1 months ago
Loadstar: Load Shedding in Data Stream Mining
In this demo, we show that intelligent load shedding is essential in achieving optimum results in mining data streams under various resource constraints. The Loadstar system intro...
Yun Chi, Haixun Wang, Philip S. Yu
ICDE
2008
IEEE
137views Database» more  ICDE 2008»
14 years 9 months ago
Stop Chasing Trends: Discovering High Order Models in Evolving Data
Abstract-- Many applications are driven by evolving data -patterns in web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction...
Shixi Chen, Haixun Wang, Shuigeng Zhou, Philip S. ...
CAISE
2006
Springer
13 years 11 months ago
Bridging the Gap between Data Warehouses and Organizations
Abstract. Data Warehouse (DWH) systems are used by decision makers for performance measurement and decision support. Currently the main focus of the DWH research field is not as mu...
Veronika Stefanov
KDD
2005
ACM
149views Data Mining» more  KDD 2005»
14 years 1 months ago
A distributed learning framework for heterogeneous data sources
We present a probabilistic model-based framework for distributed learning that takes into account privacy restrictions and is applicable to scenarios where the different sites ha...
Srujana Merugu, Joydeep Ghosh