data sets | Sciweavers

178

Voted

SAC
2009
ACM

113views Applied Computing» more SAC 2009»

Combining statistics and semantics via ensemble model for document clustering

16 years 1 months ago

Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge p...

Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan

claim paper

Read More »

156

Voted

CBMS
2009
IEEE

161views Medical Imaging» more CBMS 2009»

Domain concept-based queries for cancer research data sources

16 years 1 months ago

Download www.cs.ucl.ac.uk

Biomedical scientists generate, access, validate and interpret multiple distributed and heterogeneous data sets. Semantic annotations for these data sets are paramount for exchang...

Alejandra González Beltrán, Anthony ...

claim paper

Read More »

190

click to vote

BIBM
2009
IEEE

192views Bioinformatics» more BIBM 2009»

A Multi-task Feature Selection Filter for Microarray Classification

16 years 1 months ago

Download www.ist.temple.edu

A major challenge in microarray classification and biomarker discovery is dealing with small-sample high-dimensional data where the number of genes used as features is typically o...

Liang Lan, Slobodan Vucetic

claim paper

Read More »

145

click to vote

EDBT
2010
ACM

116views Database» more EDBT 2010»

HARRA: fast iterative hashed record linkage for large-scale data collections

16 years 1 months ago

Download pike.psu.edu

We study the performance issue of the “iterative” record linkage (RL) problem, where match and merge operations may occur together in iterations until convergence emerges. We ...

Hung-sik Kim, Dongwon Lee

claim paper

Read More »

202

Voted

ICDE
2010
IEEE

408views Database» more ICDE 2010»

Hive - a petabyte scale data warehouse using Hadoop

16 years 1 months ago

Download infolab.stanford.edu

— The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensiv...

Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zhen...

claim paper

Read More »

186

click to vote

WWW
2010
ACM

233views Internet Technology» more WWW 2010»

Inferring relevant social networks from interpersonal communication

16 years 1 months ago

Download www.public.asu.edu

Researchers increasingly use electronic communication data to construct and study large social networks, eﬀectively inferring unobserved ties (e.g. i is connected to j) from obs...

Munmun De Choudhury, Winter A. Mason, Jake M. Hofm...

claim paper

Read More »

171

click to vote

CVPR
2010
IEEE

266views Computer Vision» more CVPR 2010»

Unsupervised Learning of Invariant Features Using Video

16 years 3 months ago

Download ai.stanford.edu

We present an algorithm that learns invariant features from real data in an entirely unsupervised fashion. The principal benefit of our method is that it can be applied without hu...

David Stavens, Sebastian Thrun

claim paper

Read More »

144

Voted

RECOMB
2001
Springer

126views Computational Biology» more RECOMB 2001»

Analysis techniques for microarray time-series data

16 years 7 months ago

Download www.cs.ucdavis.edu

We address possible limitations of publicly available data sets of yeast gene expression. We study the predictability of known regulators via time-series analysis, and show that l...

Vladimir Filkov, Steven Skiena, Jizu Zhi

claim paper

Read More »

197

click to vote

KDD
2009
ACM

229views Data Mining» more KDD 2009»

An association analysis approach to biclustering

16 years 7 months ago

Download www-users.cs.umn.edu

The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis perform...

Gaurav Pandey, Gowtham Atluri, Michael Steinbach, ...

claim paper

Read More »

212

Voted

ICML
2000
IEEE

127views Machine Learning» more ICML 2000»

A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets

16 years 7 months ago

Download www.aladdin.cs.cmu.edu

This paper has no novel learning or statistics: it is concerned with making a wide class of preexisting statistics and learning algorithms computationally tractable when faced wit...

Paul Komarek, Andrew W. Moore

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers