Binary classification is a core data mining task. For large datasets or real-time applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated Newton methods, and iteratively re-weighted least squares make it faster and more accurate than modern SVM implementations, and relatively insensitive to parameters. It is robust to linear dependencies and some scaling problems, making most data preprocessing unnecessary. 1 Motivation and Terminology This article is motivated by the success of a fast, simple logistic regression (LR) algorithm in several highdimensional data mining engagements, including life sciences data mining [10, 7], threat classification and temporal link analysis [16], collaborative filtering [11], and text processing [7]. The rise of support vector machines (SVMs) for binary classification has renewed interest i...
Paul Komarek, Andrew W. Moore