
177views Data Mining» more  SDM 2008»
14 years 4 months ago
Roughly Balanced Bagging for Imbalanced Data
Imbalanced class problems appear in many real applications of classification learning. We propose a novel sampling method to improve bagging for data sets with skewed class distri...
Shohei Hido, Hisashi Kashima
168views Data Mining» more  SDM 2008»
14 years 4 months ago
Semi-Supervised Clustering via Matrix Factorization
The recent years have witnessed a surge of interests of semi-supervised clustering methods, which aim to cluster the data set under the guidance of some supervisory information. U...
Fei Wang, Tao Li, Changshui Zhang
14 years 4 months ago
A Weighted Distance Measure for Calculating the Similarity of Sparsely Distributed Trajectories
This article presents a method for the calculating similarity of two trajectories. The method is especially designed for a situation where the points of the trajectories are distr...
Pekka Siirtola, Perttu Laurinen, Juha Röning
14 years 4 months ago
Artificial Data Sets Based on Knowledge Generators: Analysis of Learning Algorithms Efficiency
This paper proposes a methodology to generate artificial data sets to evaluate the behavior of machine learning techniques. The methodology relies in the definition of a domain an...
Joaquin Rios-Boutin, Albert Orriols-Puig, Josep Ma...
14 years 4 months ago
Visualization and exploration of time-varying medical image data sets
In this work, we propose and compare several methods for the visualization and exploration of time-varying volumetric medical images based on the temporal characteristics of the d...
Zhe Fang, Torsten Möller, Ghassan Hamarneh, A...
14 years 4 months ago
Parallelizing single patch pass clustering
Clustering algorithms such as k-means, the self-organizing map (SOM), or Neural Gas (NG) constitute popular tools for automated information analysis. Since data sets are becoming l...
Nikolai Alex, Barbara Hammer
204views Data Mining» more  SDM 2010»
14 years 4 months ago
Scalable Tensor Factorizations with Missing Data
The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer...
Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, Mo...
14 years 4 months ago
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers
There is growing interest in applying Bayesian techniques to NLP problems. There are a number of different estimators for Bayesian models, and it is useful to know what kinds of t...
Jianfeng Gao, Mark Johnson
176views Data Mining» more  DMIN 2008»
14 years 4 months ago
Multi-Class SVM for Large Data Sets Considering Models of Classes Distribution
Support Vector Machines (SVM) have gained profound interest amidst the researchers. One of the important issues concerning SVM is with its application to large data sets. It is rec...
Jair Cervantes, Xiaoou Li, Wen Yu
14 years 4 months ago
The CoNLL 2007 Shared Task on Dependency Parsing
The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in...
Joakim Nivre, Johan Hall, Sandra Kübler, Ryan...