Transformation of both the response variable and the predictors is commonly used in fitting regression models. However, these transformation methods do not always provide the maximum linear correlation between the response variable and the predictors, especially when there are non-linear relationships between predictors and the response such as the medical data set used in this study. A spline based transformation method is proposed that is second order smooth, continuous, and minimizes the mean squared error between the response and each predictor. Since the computation time for generating this spline is O(n), the processing time is reasonable with massive data sets. In contrast to cubic smoothing splines, the resulting transformation equations also display a high level of efficiency for scoring. Data used for predicting health outcomes contains an abundance of non-linear relationships between predictors and the outcomes requiring an algorithm for modeling them accurately. Thus, a tr...
David S. Vogel, Morgan C. Wang