large data sets | Sciweavers

161

DASFAA
2008
IEEE

109views Database» more DASFAA 2008»

Bulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces

16 years 1 months ago

Applications demanding multidimensional index structures for performing eﬃcient similarity queries often involve a large amount of data. The conventional tuple-loading approach t...

Hyun-Jeong Seok, Gang Qian, Qiang Zhu, Alexander R...

claim paper

Read More »

161

click to vote

SEMWEB
2009
Springer

227views Internet Technology» more SEMWEB 2009»

Functions over RDF Language Elements

16 years 1 months ago

Download www.cs.univie.ac.at

Spreadsheet tools are often used in business and private scenarios in order to collect and store data, and to explore and analyze these data by executing functions and aggregation...

Bernhard Schandl

claim paper

Read More »

168

click to vote

PODS
2007
ACM

139views Database» more PODS 2007»

Management of probabilistic data: foundations and challenges

16 years 6 months ago

Download www.cs.washington.edu

Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as pro...

Nilesh N. Dalvi, Dan Suciu

claim paper

Read More »

190

click to vote

SIGMOD
2001
ACM

193views Database» more SIGMOD 2001»

Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data

16 years 6 months ago

Download www.dbs.informatik.uni-muenchen.de

The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The s...

Christian Böhm, Bernhard Braunmüller, Fl...

claim paper

Read More »

206

click to vote

RECOMB
2004
Springer

182views Computational Biology» more RECOMB 2004»

Computational identification of evolutionarily conserved exons

16 years 7 months ago

Download compgen.bscb.cornell.edu

Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multispecies version of the ab initio gene prediction problem. These models ...

Adam C. Siepel, David Haussler

claim paper

Read More »

147

click to vote

CHI
2003
ACM

113views Human Computer Interaction» more CHI 2003»

Efficient user interest estimation in fisheye views

16 years 7 months ago

Download jheer.org

We present a new technique for efficiently computing Degree-of-Interest distributions to inform the visualization of graph-structured data. The technique is independent of the int...

Jeffrey Heer, Stuart K. Card

claim paper

Read More »

209

click to vote

KDD
2003
ACM

180views Data Mining» more KDD 2003»

Classifying large data sets using SVMs with hierarchical clusters

16 years 7 months ago

Download vorlon.case.edu

Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convey several salient ...

Hwanjo Yu, Jiong Yang, Jiawei Han

claim paper

Read More »

185

click to vote

KDD
2007
ACM

335views Data Mining» more KDD 2007»

Detecting changes in large data sets of payment card data: a case study

16 years 7 months ago

Download www.opendatagroup.com

An important problem in data mining is detecting changes in large data sets. Although there are a variety of change detection algorithms that have been developed, in practice it c...

Chris Curry, Robert L. Grossman, David Locke, Stev...

claim paper

Read More »

140

click to vote

ICML
2005
IEEE

98views Machine Learning» more ICML 2005»

Intrinsic dimensionality estimation of submanifolds in Rd

16 years 7 months ago

Download www.machinelearning.org

We present a new method to estimate the intrinsic dimensionality of a submanifold M in Rd from random samples. The method is based on the convergence rates of a certain U-statisti...

Matthias Hein, Jean-Yves Audibert

claim paper

Read More »

171

click to vote

ICML
2008
IEEE

138views Machine Learning» more ICML 2008»

Fully distributed EM for very large datasets

16 years 7 months ago

Download www.cs.berkeley.edu

In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of th...

Jason Wolfe, Aria Haghighi, Dan Klein

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers