Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

164

KDD
1998
ACM

99views Data Mining» more KDD 1998»

On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases

15 years 10 months ago

On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases

Download research.microsoft.com

For a wide variety of classification algorithms, scalability to large databases can be achieved by observing that most algorithms are driven by a set of sufficient statistics that are significantly smaller than the data. By relying on a SQL backend to compute the sufficient statistics, we leverage the query processing system of SQL databases and avoid the need for moving data to the client. We present a new SQL operator (Unpivot) that enables efficient gathering of statistics with minimal changes to the SQL backend. Our approach results in significant increase in performance without requiring any changes to the physical layout of the data. We show analytically how this approach outperforms an alternative that requires changing in the data layout. We also compare effect of data representation and show that a "dense" representation may be preferred to a "sparse" one, even when the data are fairly sparse.

Goetz Graefe, Usama M. Fayyad, Surajit Chaudhuri

Real-time Traffic

Data Mining | KDD 1998 | SQL Backend | SQL Databases | Sufficient Statistics |

claim paper

Related Content

» RIOT IOEfficient Numerical Computing without SQL

» COLRTree CommunicationEfficient SpatioTemporal Indexing for a Sensor Data Web Portal

» BICEPP an examplebased statistical text mining method for predicting the binary characteri...

» Interactive exploration of very large relational datasets through 3D dynamic projections

» Models and Indices for Integrating Unstructured Data with a Relational Database

» Mayday integrative analytics for expression data

» Automatic web query classification using labeled and unlabeled training data

» Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection

» Texture retrieval based on a nonparametric measure for multivariate distributions

Post Info
More Details (n/a)

Added	06 Aug 2010
Updated	06 Aug 2010
Type	Conference
Year	1998
Where	KDD
Authors	Goetz Graefe, Usama M. Fayyad, Surajit Chaudhuri

Comments (0)