Sciweavers

NIPS
2007
14 years 28 days ago
Consistent Minimization of Clustering Objective Functions
Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure....
Ulrike von Luxburg, Sébastien Bubeck, Stefa...
SDM
2008
SIAM
139views Data Mining» more  SDM 2008»
14 years 28 days ago
Simultaneous Unsupervised Learning of Disparate Clusterings
Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult...
Prateek Jain, Raghu Meka, Inderjit S. Dhillon
LREC
2008
120views Education» more  LREC 2008»
14 years 29 days ago
Division of Example Sentences Based on the Meaning of a Target Word Using Semi-Supervised Clustering
In this paper, we describe a system that divides example sentences (data set) into clusters, based on the meaning of the target word, using a semi-supervised clustering technique....
Hiroyuki Shinnou, Minoru Sasaki
LREC
2008
129views Education» more  LREC 2008»
14 years 29 days ago
Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size
Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the simil...
Hiroyuki Shinnou, Minoru Sasaki
ICMLA
2008
14 years 29 days ago
Graph-Based Multilevel Dimensionality Reduction with Applications to Eigenfaces and Latent Semantic Indexing
Dimension reduction techniques have been successfully applied to face recognition and text information retrieval. The process can be time-consuming when the data set is large. Thi...
Sophia Sakellaridi, Haw-ren Fang, Yousef Saad
HIS
2008
14 years 29 days ago
Diagnosing Patients Combining Principal Components Analysis and Case Based Reasoning
This paper addresses the application of a PCA analysis on categorical data prior to diagnose a patients data set using a Case-Based Reasoning (CBR) system. The particularity is th...
Carles Pous, Dani Caballero, Beatriz López
ESANN
2008
14 years 29 days ago
Homogeneous bipartition based on multidimensional ranking
We present an algorithm which partitions a data set in two parts with equal size and experimentally nearly the same distribution measured through the likelihood of a Parzen kernel ...
Michaël Aupetit
SDM
2010
SIAM
195views Data Mining» more  SDM 2010»
14 years 29 days ago
Adaptive Informative Sampling for Active Learning
Many approaches to active learning involve periodically training one classifier and choosing data points with the lowest confidence. An alternative approach is to periodically cho...
Zhenyu Lu, Xindong Wu, Josh Bongard
DRR
2008
14 years 29 days ago
Segmentation-based retrieval of document images from diverse collections
We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of reg...
Michael A. Moll, Henry S. Baird
LREC
2010
187views Education» more  LREC 2010»
14 years 29 days ago
A Resource for Investigating the Impact of Anaphora and Coreference on Inference
Discourse phenomena play a major role in text processing tasks. However, so far relatively little study has been devoted to the relevance of discourse phenomena for inference. The...
Azad Abad, Luisa Bentivogli, Ido Dagan, Danilo Gia...