Sciweavers

KDD
2004
ACM
136views Data Mining» more  KDD 2004»
14 years 9 months ago
A cross-collection mixture model for comparative text mining
In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of compara...
ChengXiang Zhai, Atulya Velivelli, Bei Yu
KDD
2004
ACM
114views Data Mining» more  KDD 2004»
14 years 9 months ago
Scalable mining of large disk-based graph databases
Mining frequent structural patterns from graph databases is an interesting problem with broad applications. Most of the previous studies focus on pruning unfruitful search subspac...
Chen Wang, Wei Wang 0009, Jian Pei, Yongtai Zhu, B...
KDD
2004
ACM
179views Data Mining» more  KDD 2004»
14 years 9 months ago
1-dimensional splines as building blocks for improving accuracy of risk outcomes models
Transformation of both the response variable and the predictors is commonly used in fitting regression models. However, these transformation methods do not always provide the maxi...
David S. Vogel, Morgan C. Wang
KDD
2004
ACM
182views Data Mining» more  KDD 2004»
14 years 9 months ago
Rotation invariant distance measures for trajectories
For the discovery of similar patterns in 1D time-series, it is very typical to perform a normalization of the data (for example a transformation so that the data follow a zero mea...
Michail Vlachos, Dimitrios Gunopulos, Gautam Das
KDD
2004
ACM
139views Data Mining» more  KDD 2004»
14 years 9 months ago
Learning a complex metabolomic dataset using random forests and support vector machines
Metabolomics is the omics science of biochemistry. The associated data include the quantitative measurements of all small molecule metabolites in a biological sample. These datase...
Young Truong, Xiaodong Lin, Chris Beecher
KDD
2004
ACM
127views Data Mining» more  KDD 2004»
14 years 9 months ago
A generative probabilistic approach to visualizing sets of symbolic sequences
There is a notable interest in extending probabilistic generative modeling principles to accommodate for more complex structured data types. In this paper we develop a generative ...
Peter Tiño, Ata Kabán, Yi Sun
KDD
2004
ACM
164views Data Mining» more  KDD 2004»
14 years 9 months ago
Ordering patterns by combining opinions from multiple sources
Pattern ordering is an important task in data mining because the number of patterns extracted by standard data mining algorithms often exceeds our capacity to manually analyze the...
Pang-Ning Tan, Rong Jin
KDD
2004
ACM
210views Data Mining» more  KDD 2004»
14 years 9 months ago
Probabilistic author-topic models for information discovery
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...
KDD
2004
ACM
110views Data Mining» more  KDD 2004»
14 years 9 months ago
Generalizing the notion of support
The goal of this paper is to show that generalizing the notion of support can be useful in extending association analysis to non-traditional types of patterns and non-binary data....
Michael Steinbach, Pang-Ning Tan, Hui Xiong, Vipin...