We propose a family of novel cost-sensitive boosting methods for multi-class classification by applying the theory of gradient boosting to p-norm based cost functionals. We establ...
Naive Bayes and logistic regression perform well in different regimes. While the former is a very simple generative model which is efficient to train and performs well empirically...
Many time series prediction methods have focused on single step or short term prediction problems due to the inherent difficulty in controlling the propagation of errors from one ...
Traditional association mining algorithms use a strict definition of support that requires every item in a frequent itemset to occur in each supporting transaction. In real-life d...
Rohit Gupta, Gang Fang, Blayne Field, Michael Stei...
This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wis...
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which perfor...
We present a detailed study of network evolution by analyzing four large online social networks with full temporal information about node and edge arrivals. For the first time at ...
Jure Leskovec, Lars Backstrom, Ravi Kumar, Andrew ...
Transactional data are ubiquitous. Several methods, including frequent itemsets mining and co-clustering, have been proposed to analyze transactional databases. In this work, we p...
Yang Xiang, Ruoming Jin, David Fuhry, Feodor F. Dr...
Researchers in the social and behavioral sciences routinely rely on quasi-experimental designs to discover knowledge from large databases. Quasi-experimental designs (QEDs) exploi...
David D. Jensen, Andrew S. Fast, Brian J. Taylor, ...